Reducing Interrupt Latency
|
|
|
|
|
|
|
|
Journal
Week 1 (W33, 14-18 Aug.)
First week at Enea. Spent most of my time reading OSE manuals, taking notes. Also read a little on the PowerPC. Installed and played around a little with OSE. My supervisor had written a program to measure the interrupt latency and I tried to get it to run in some simulated environment without any luck.
Week 2 (W34, 21-25 Aug.)
Started looking at simulators and debuggers for the PowerPC platform. Especially looking for a tool that can show what happens in the cache while stepping through code. Got in touch with Motorola but they didn't have any suggestions. Downloaded and compiled SimPPC, a simulator now part of GDB. Unfortunately SimPPC only simulated the 600 series of the PowerPC and that is no use to me. Also tried Green hills PowerPC simulator that is part of the debug tool. It to didn't prove much use to me.
Week 3 (W35, 28 Aug. - 1 Sep.)
Received and installed SingelStep from SDS. It is the tool I've been looking for. It gives a complete view of both the instruction and data cache. Unfortunately my supervisors measure program didn't run in the simulator. Since the program uses serial ports to output data I removed all references to the serial ports but the program still didn't function.
Week 4 (W36, 4 - 8 Sep.)
I figured that I really don't need to measure the latency, it is sufficient to count the number of hits in the cache to see if locking has a positive or negative effect on the system. I wrote a simple program that just has one process and has interrupts at regular intervals. It worked in the simulator and now I'm trying to rewrite the code that is being run at each interrupt to assembler code so I then can insert cache locking.
Week 5 (W37, 11 - 15 Sep.)
Not having a PowerPC box to run on I've started writing code to lock program code into the cache. It was a lot easier then expected and it wont take long to write it. Been kind of sick all week, nothing serious but not feeling 100 percent so it took it slow at the end of the week.
Week 6 (W38, 18 - 22 Sep.)
Completed code for locking functions into the instruction cache, also added support to lock data into the cache. Tested locking in SingelStep debugger and it works. Tried looking functions and processes into the cache and it works. Having no hardware to play with right now I've added support for Green Hill compiler (I'm using DIAB as default), real shame that the syntax for assembler code is different in different compilers. Should probably add support for GCC also....
Thursday: Managed to borrow an MBX-board so now I can do real measurements. The initial result is actually better then I expected, in certain cases the maximum interrupt latency is reduced by 20% after locking. The average interrupt latency stays pretty much the same though. I haven't measured the total system performance before and after locking code in the cache. I have to do that to se if the reduced interrupt latency is worth it. Also I can tell that the speedup is very much depended on luck. Depending on where the functions that are run after the interrupt reside in the memory. If they are badly placed in the memory and share the same cache blocks the interrupt latency can be worse when locking. I'm not sure how to solve that, the 2-way set associative cache is the problem, it sure would be nicer if the PowerPC was 8-way or even full set associative.
Friday: The power when off-line two times today, really annoying because after the power came back I had to wait like half an hour for the central license server booting up. Shouldn't complain really, it's nice to take a break from the computer....
Week 7 (W39, 25 - 29 Sep.)
Tuesday: The locking seems to work fine, I don't know if I can ever guarantee that by using cache locking you get a reduced interrupt latency. Have not measured yet the total performance though. Next step is to analyse the longest regions where interrupts are disabled. By reducing the longest regions with disabled interrupts we also reduce the interrupt latency. Today I'm just trying to compile OSE which isn't that easy, it's the make files that are giving me a headache... Recently learned that OSE was just ported to a processor called MP7400, it's configuration is 8-way set associative, 32 + 32 kB L1-cache and up to 2 Megs of (lockable?) L2-cache, I just gotta get my hands on one of those!!
Week 8 (W40, 2 - 6 Okt.)
This week I started profiling the OSE kernel and it's modules, I started with the heap library and measured the length of the regions where interrupts are disabled. The idea is to measure how much the longer the interrupts are disabled when the cache is smaller due to locking of the interrupt handler. I finally managed to get everything working and I even wrote a simple memory-thrasing process to measure the locked regions. The results and conclusions will not be published here until I get approval from Enea. I have some new small project that I will start next week. The first is to write a small application that tries to profile the cache usage by looking at the map file that is produced after each compilation. The next thing is to find out more about an upcoming processor from IBM that has high set-associativety and something called transient cache. Though I will probably not get my hands on one of those I will still try to predict how it's caches are best used. Also I will profile the rest of the kernel and modules which shouldn't take long because now I know how it's done. And finally start doing some heavy measurements with different configurations and try to produce some nice graphs... That's all folks! Tomorrow I'll go to the Deep Purple concert at Globe arena, that's gonna be cool!!
Week 9 (W41, 9 - 13 Okt.)
I wrote a Java application this week that parses a map file generated
at compile time. The purpose is to se where functions will be cached in
the memory. The use of the program is if you want to lock several functions
you can se if the functions will overlap in the cache. Any legal cache
configuration can be set in the program. But I did run across a problem.
Since I'm particularly interested in locking function at low level (interrupt
handling) the code for some functions are pure assembler (*.s files). THe
problem is that the compiler can't tell when a function written in assembler
ends. So the map file will only contain the offset for each label. So what
I did was to calculate the size from a label to the next label. This at
least gives the option of locking a assembler function by selecting all
labels that are included in the function, not a good solution but the only
one I could think of. Besides that I managed to get hold of a "review copy"
of IBM's upcoming PPC440 processor and read about transient cache. It looks
pretty interesting but I still want more details, but at first glance it
could be quite useful if correctly used. I lost my PPC board this week,
but I will try to get a hold of one next week and start doing more detailed
measurements of cache locking and hopefully get some diagrams. Maybe also
profile the locking regions of the rest of the OSE modules, right now I
only profiled the memory module and that was in debugging mode so the results
aren't accurate. By the way; the Deep Purple concert sucked but I least
got to se them performing live....
Week 10, (W42, 16 - 20 Okt.)
This week hasn't been as productive as I wanted it to be. I worked on
my map file profiling program and it is pretty much as finished as it can
get. I tried to formulate some sort of formula for predicting the performance
of the system when locking functions, but I have come to the conclusion
that the map file does not contain enough information to be able to predict
performance (not even give a rough estimate). I've read some more on IBM's
440GP processor, it's quite interesting since they have replaced the most
commonly used LRU replacement algorithm with a round robin algorithm because
of the very high set associatively (64-way). The cache can then be split
into two regions, (actually three if you count locking), a normal and a
transient region. Each region will then have it's own "pointer" for round
robin replacement. But if I understand it correctly the transient region
not only can be used for less frequently accessed data but also to keep
instructions longer in the cache then they would have been if they were
mapped into the normal region. The Power PC boxes have been in high demand
this week and I haven't been able to get one, I will try to next week and
do some more measurements on cache locking. Wow, it's been 10 weeks already?!
It feels like I've just started a week ago.... This week has been pretty
lousy when because I've come late to work 4 out of 5 days, and all because
of the subway was late or in one case even cancelled. How hard is to drive
a train anyway, just just gas and brake!! A car would sure be nice. ...hmm...
I wonder if a thesis student is entitled to a company car at Enea, I mean;
if I got a cell phone then I should probably get a car to, right? ;)
Week 11, (W43, 23 - 27 Okt.)
Measuring, measuring, measuring.... This week I did real measurements
on the performance of cache locking. I used a program written by my supervisor
here at Enea to measure latency. The program outputs maximum, minimum and
average latencies and I also added Jitter calculation. The program includes
a lot of modules and is simulating a heavily loaded system. To make things
even worse I also added a memory thrashing process (which I will have more
use for later). I removed all debugging code and compiled without any debugging
flags. Then ran the program on a powerpc box with disabled debugging feature.
(The box has a hardware thingy that can force the CPU to throw all memory
loads and stores onto the buss even if there is a hit in the cache, thus
enabling the debug thingy to snoop the bus and forward debugging information
to a debugging program.) The results where pretty positive, actually better
then I excepted, we are not talking about extreme improvements but still
big enough to be considered in some cases. Since measuring in my case means
a lot of waiting I've started writing and reading up on real-time operating
systems. Also did a pre study on IBM's 440GP processor and how it can be
used in the case of reducing interrupt latency. Next week I'll will measure
how locking affects the rest of the system. I already did add profiling
code to one of OSE's modules, the main goal is to se how the run of interrupt
disabled regions are affected by cache locking. See you all next week....
Week 12 (W44, 30 Okt. - 3 Nov.)
Let's se, what have I done this week? First, I tried to get an estimate
on how cache locking affects the entire system. Since I'm actually measuring
the performance of an entire operating system, all it's modules and all
user defined processes, measuring is not that trivial. Interrupts can occur
at any time and need handling thus the measuring is not that deterministic.
So I decided to estimate total performance by only looking at the regions
in OSE were interrupts are disabled and measure the average time spent
in a certain interrupt disabled region. The performance wasnät as
degredated a as feared but it did show longer times for interrupt disabled
regions, but I will do more measuring next week. The next thing I did this
week was that I started to look at other PowerPC processors with different
cache configuration. The first one my supervisor here at Enea suggested
was MPC750, it has a larger instruction cache (32 kb) and a higher set
associativety (8 way). I did manage to port the program that measures the
interrupt latency but I should perhaps have read the manual for the MPC750
first. Sure, it does support locking of the L1 cache but only the entire
cache at once. Since the entire interrupt handler of OSE is much smaller
than 32 kb locking the entire L1 instruction cache is really overkill,
but hey, at least now you can measure the interrupt latency on the MPC750
so that day wasn't a complete waste of time. Moving on to the MPC8260;
it hasn't got the same sexy features as the MPC750, "only" 16 kb instruction
cache and "only" 4 way set associativety, but hey, I run on what I manage
to get my hands on. The cache locking on this one is different from both
MPC860 which I have been running on and the MPC750. Here you lock a way,
so the processor has 4 ways and you can lock one, two or three of the ways.
The real problem is loading instructions into the cache. If it was the
data cache you just do "load" but with instructions you can do a neat trick,
but that I won't give that away until my report so stay tuned.... Also
altered the measuring program I use to simulate a more lightly loaded system
and I'm doing some measurements on that as well.... Until next week, happy
Halloween....
Week 13 (W45, 6 - 10 Nov.)
This week has been pretty unproductive, I've been trying to get my hands
on a MPC8260, i did receive a board but still need a probe to run on it.
I've also scaled down the latency test program to simulate a more lightly
loaded system and did the same measurements. The results were pretty good,
I didn't expect that the interrupt latency would decrease as much as it
did after locking the interrupt handler into the cache. I've come to the
conclusion that I need to know exactly what happens on a interrupt, I tried
by locking at the compiled code, following the branches and so but there
are a lot of them so I will try to get some source code. I did spend a
day trying to compile my code using a profiling tool called RTA but I didn't
manage to get the program to run on the target, when asking for help I
learned that the only guy who knew how to use the tool have left Enea,
though luck. I also extended my profiling of locked regions to also include
inet and the board libraries and the only thing left is really the to profile
the core it self, but I don't know if that is possible without spending
weeks on reading the source code and adding profiling code. But I did find
some locked regions that are pretty long. Finally, I've also started on
the report it self, I did the contents list and then also wrote some theory
about cache in general. Well, that it for this time, I think....
Week 14 (W46, 13 - 17 Nov.)
Okay, now this week I implemented some benchmarking. I managed to find some open source code for Dhrystone v1.1 benchmark that was written way back in 1984, didn't know they had computers back then... ;-) I implemented the program as a high priority process in OSE. The reason for benchmarking is that I want some numbers on system performance when parts of the cache isn't available because it has been locked. Actually I should have just implemented the benchmark as a separate program instead of using OSE, because I don't really benchmark OSE but the target platform, my PowerPC. Anyway, unsurprisingly the system runs slower when the cache is smaller, duh! But now I know how much slower and have the numbers to prove it. Another thing I did this week was rewriting parts of the locking code for MPC860 to make it more flexible. I also discovered a bug were when I though I was unlocking all locked cache blocks I actually invalidated the whole cache. Well, were there is code there are always bugs, right? Just ask MicroSoft, they know! On thursday I ditched work to check out the Armada, it resulted in a good harvest of candy, bottle openers and some toys. But I did get a chance to speak to some really nice companies and if things don't work out with Enea I know were to turn. Now, the plan is to just get some figures out from the MPC8260 and then start writting the report seriously. If I manage to get my hands on the source code for OSE kernel then I maybe can do some profiling on the core. Good fight, good night....
Week 15 (W47, 20 - 24 Nov.)
This will be a short update because this week certainly felt short. I got the hardware I needed on monday to be able to run MPC8260 but then I lost it the day after. But I got new hardware today which I hope I can keep for a couple of days. Spent most of monday to just figure out how to run a program on the board, finally realized that I had to change a setting in my bios, why does it always have to be something that doesn't work?! Anyway after that I quickly managed to get my benchmark program to run. Next I tried to measure interrupt latency but I still haven't managed to do that. I think I need to port some of the assembler code and that is what I'm doing today. Hopefully I can get to work as soon as possible becasue after that I will need a day or two to verify that my locking program works and the a couple of days to do measurements. I also seriously started to write my report, did a table of contents and started writing a couple of pages on real time operating systems in general. Back to the wonderful world of powerpc assembler programming, se ya....
Week 16 (W48, 27 Nov. - 1 Dec)
Gezzez, is it december already?! Almost X-mas, time sure flies by quickly when you're having fun. Actually this week wasn't any fun at all. I struggled all week with the MPC8260 board, at one point I even thought that there actually was something wrong with the board. But I think I know now what the problem is and I hope to do some real measurements on monday if I get the board to work properly. Other then that I didn't do much, I was at Electrum attending one presentation and it didn't seem to be as hard as I thought. I think I'll do all right when my time comes. But now I should really do some work on my report. No, that it for this week, I hope we get some snow next week....
Week 17 (W49, 4 - 8 Dec.)
On monday I got the MPC8260 board running and started doing measurements. Im also writing on my report as well. Other then that nothing really exciting happened this week. I'll try to get the measurements done by the end of next week so I can concentrate fully on the report.
Week 18 (W50, 11 - 15 Dec.)
Been doing two things this week, more measurements on the MPC8260. I was suspicious on the results I was getting and took a closer look at my measurement program and found some bugs that resulted in me getting wrong results. But the bugs have been taken care of and I re-measured and got the result I predicted. I been working on the report it self as well, doing some graphs and changing the layout. I still need to write a little more on the theoretical part on the report, there are some blanks there but I think I'm quite pleased on the implementation part of the report. I have clearly shown that interrupt latency can be improved which I'm really happy about. One week until X-mas and I still haven't bought any presents, will be the on the last day as usual. Nope, now I off to Enea's christmas party....
Week 19 (W1, 1 - 5 Jan.)
The holidays are over and I'm back in business! Completed all measurements that I'm going to put into the report, unless Enea wants me to do more. 100% of my energy is now put on writing the report itself. Some days I manage to actually write a couple of pages while some days it seems that I only produce a couple of sentences. I will try to keep my deadline and have a complete report by the end of next week. Should probably try to find someone to oppose on. Pretty cool that its year 2001, is it this year we're suppose to find the monolith on the moon? ;-)
Week 20 (W2, 8 - 12 Jan.)
Last day of the last week! I'm I done? No, not really, I need one more
week to finish of the report. Then I will need time to create a presentation.
But a "beta" of the report will be finished by the end of next week. After
that it will most certainly need an update after Vlad reads it before it
can be considered final. Well, it looks like its going to be a "Week 21"
update so stay tuned yet another week....
| Last Modified: 2001-03-12 |
Responsible: dejan.bucar@telelogic.com
|