Coldfire cache project
Moderators: adafruit_support_bill, adafruit
Please be positive and constructive with your questions and comments.
- zener
- Posts: 4567
- Joined: Sat Feb 21, 2009 2:38 am
Coldfire cache project
I have a school project where I have to write some code that would benefit a lot from the icache in the Coldfire. The idea is I write a function that with the cache turned on runs very fast but with the cache turned off runs really slow. Any ideas? Thanks.
- opossum
- Posts: 636
- Joined: Fri Oct 26, 2007 12:42 am
Re: Coldfire cache project
Any code that fits entirely within the cache will benefit most from the cache. So if you have 4k of instruction cache, then write code smaller than 4k. If the code is larger, then there will be "thrashing".
If there are ISRs active, then they must also fit in the cache along with the mainline code.
Is Coldfire still an active product line? I assumed ARM and MIPS would have killed it by now.
If there are ISRs active, then they must also fit in the cache along with the mainline code.
Is Coldfire still an active product line? I assumed ARM and MIPS would have killed it by now.
- westfw
- Posts: 2008
- Joined: Fri Apr 27, 2007 1:01 pm
Re: Coldfire cache project
Yes, in fact there have been recent product introductions of "microcontroller-like" coldfire devices. The Freescale "Tower" development system launched as an Arduino-killer (hah!) a couple years ago was first showcasing one of the new coldfire products.Is Coldfire still an active product line?
- zener
- Posts: 4567
- Joined: Sat Feb 21, 2009 2:38 am
Re: Coldfire cache project
Hmmmm... Yes, the i-cache is 4K. So you are saying I just make sure the code is smaller than 4K? I am thinking there must be other considerations as well. Is the trick to just make code that is exactly 4K?oPossum wrote:Any code that fits entirely within the cache will benefit most from the cache. So if you have 4k of instruction cache, then write code smaller than 4k. If the code is larger, then there will be "thrashing".
If there are ISRs active, then they must also fit in the cache along with the mainline code.
- zener
- Posts: 4567
- Joined: Sat Feb 21, 2009 2:38 am
Re: Coldfire cache project
So the 2 requirements I have come up with are:
Code must be smaller than 4K, or possibly filling that space almost exactly, not sure if that would matter necessarily.
Code must access external program DRAM as much as possible.
So my question is, what kind of functions can I write that would fetch a lot from DRAM? Would I fill it with a giant look up table?
Code must be smaller than 4K, or possibly filling that space almost exactly, not sure if that would matter necessarily.
Code must access external program DRAM as much as possible.
So my question is, what kind of functions can I write that would fetch a lot from DRAM? Would I fill it with a giant look up table?
- opossum
- Posts: 636
- Joined: Fri Oct 26, 2007 12:42 am
Re: Coldfire cache project
I am not familiar with the ColdFire, so I can't give specific answers.
As a general rule smaller code is more likely to be kept entirely in the i-cache and run at maximum speed. You will have to study the spec sheet for the chip you are using to understand the cache line fill and discard logic.
It the chip has d-cache, then a small data set may be kept entirely in the cache, but a larger data set would have to be read from main memory more frequently due to cache misses.
As a general rule smaller code is more likely to be kept entirely in the i-cache and run at maximum speed. You will have to study the spec sheet for the chip you are using to understand the cache line fill and discard logic.
It the chip has d-cache, then a small data set may be kept entirely in the cache, but a larger data set would have to be read from main memory more frequently due to cache misses.
- opossum
- Posts: 636
- Joined: Fri Oct 26, 2007 12:42 am
Re: Coldfire cache project
I think that may be true for unified cache (combined instruction and data cache).Zener wrote:Code must access external program DRAM as much as possible.
- westfw
- Posts: 2008
- Joined: Fri Apr 27, 2007 1:01 pm
Re: Coldfire cache project
I don't know what level of class you're in, or if there are "tricky" things about the coldfire iCache, but ... in general it is hard to write any kind of looping code that does NOT benefit significantly from the instruction cache being turned on. That is the point, after all; programmers shouldn't have to carefully craft their code, they should just be able to turn on the cache and have everything go faster...I have to write some code that would benefit a lot from the icache in the Coldfire.
The maximum advantage will occur when all the instructions you are executing are in the cache, and there are no other accesses to "slow" memory that occur (all you data is in the registers.) So, for example, a string copy benefits from the iCache because the code is all in memory, but since it keeps moving data from memory to memory, it doesn't benefit as much as it might.
The most likely "real world" example I can think of would be something like a bitwise CRC. Fetch a 32bit word into a register, and then do about 32 shifts and bittests and xors and such on it while you update a CRC value that is also in a register. It should be easy to get to something like 100 instructions executed for every memory fetch. You can find algorithms on the net (HDLC CRC or Ethernet CRC would be good) (avoid the byte-at-a-time algorithms that use a big data table. They probably end up faster overall, but they won't display the impressive speedup from the iCache.)
- zener
- Posts: 4567
- Joined: Sat Feb 21, 2009 2:38 am
Re: Coldfire cache project
I am a little fuzzy on this concept. It seems to me that would be using Sram a lot so that wouldn't have anything to do with the cache. I am thinking I just need maximum program memory fetches. BTW it has icache but no dcache. Thankswestfw wrote:The most likely "real world" example I can think of would be something like a bitwise CRC. Fetch a 32bit word into a register, and then do about 32 shifts and bittests and xors and such on it while you update a CRC value that is also in a register. It should be easy to get to something like 100 instructions executed for every memory fetch. You can find algorithms on the net (HDLC CRC or Ethernet CRC would be good) (avoid the byte-at-a-time algorithms that use a big data table. They probably end up faster overall, but they won't display the impressive speedup from the iCache.)
- westfw
- Posts: 2008
- Joined: Fri Apr 27, 2007 1:01 pm
Re: Coldfire cache project
Easier to think about:
count the one bits in an array of longs.
The general idea is to execute a lot of instructions for each fetch from data memory. Presumably, to do anything interesting, you must fetch SOMETHING from data memory!
count the one bits in an array of longs.
The general idea is to execute a lot of instructions for each fetch from data memory. Presumably, to do anything interesting, you must fetch SOMETHING from data memory!
- zener
- Posts: 4567
- Joined: Sat Feb 21, 2009 2:38 am
Re: Coldfire cache project
So when I said "fill it with a giant look up table" I was on the right track... although I left out the part about reading back from the lookup table.
- zener
- Posts: 4567
- Joined: Sat Feb 21, 2009 2:38 am
Re: Coldfire cache project
Well I was able to get about a 2.2x difference (cache off/cache on) using the lookup table idea. My teacher said the highest delta he ever saw was 7x. Well a guy in my class "calculated the factorials out to 199 in one function, which was the only called once." So I guess it was recursive. Anyway, he claimed a difference of 111x. So I am not sure if that is even possible but the teacher accepted it.
Please be positive and constructive with your questions and comments.