Coldfire cache project

For Adafruit customers who seek help with microcontrollers

Moderators: adafruit_support_bill, adafruit

Please be positive and constructive with your questions and comments.
Locked
User avatar
zener
 
Posts: 4567
Joined: Sat Feb 21, 2009 2:38 am

Coldfire cache project

Post by zener »

I have a school project where I have to write some code that would benefit a lot from the icache in the Coldfire. The idea is I write a function that with the cache turned on runs very fast but with the cache turned off runs really slow. Any ideas? Thanks.

User avatar
opossum
 
Posts: 636
Joined: Fri Oct 26, 2007 12:42 am

Re: Coldfire cache project

Post by opossum »

Any code that fits entirely within the cache will benefit most from the cache. So if you have 4k of instruction cache, then write code smaller than 4k. If the code is larger, then there will be "thrashing".

If there are ISRs active, then they must also fit in the cache along with the mainline code.

Is Coldfire still an active product line? I assumed ARM and MIPS would have killed it by now.

User avatar
westfw
 
Posts: 2008
Joined: Fri Apr 27, 2007 1:01 pm

Re: Coldfire cache project

Post by westfw »

Is Coldfire still an active product line?
Yes, in fact there have been recent product introductions of "microcontroller-like" coldfire devices. The Freescale "Tower" development system launched as an Arduino-killer (hah!) a couple years ago was first showcasing one of the new coldfire products.

User avatar
zener
 
Posts: 4567
Joined: Sat Feb 21, 2009 2:38 am

Re: Coldfire cache project

Post by zener »

oPossum wrote:Any code that fits entirely within the cache will benefit most from the cache. So if you have 4k of instruction cache, then write code smaller than 4k. If the code is larger, then there will be "thrashing".

If there are ISRs active, then they must also fit in the cache along with the mainline code.
Hmmmm... Yes, the i-cache is 4K. So you are saying I just make sure the code is smaller than 4K? I am thinking there must be other considerations as well. Is the trick to just make code that is exactly 4K?

User avatar
zener
 
Posts: 4567
Joined: Sat Feb 21, 2009 2:38 am

Re: Coldfire cache project

Post by zener »

So the 2 requirements I have come up with are:

Code must be smaller than 4K, or possibly filling that space almost exactly, not sure if that would matter necessarily.

Code must access external program DRAM as much as possible.

So my question is, what kind of functions can I write that would fetch a lot from DRAM? Would I fill it with a giant look up table?

User avatar
opossum
 
Posts: 636
Joined: Fri Oct 26, 2007 12:42 am

Re: Coldfire cache project

Post by opossum »

I am not familiar with the ColdFire, so I can't give specific answers.

As a general rule smaller code is more likely to be kept entirely in the i-cache and run at maximum speed. You will have to study the spec sheet for the chip you are using to understand the cache line fill and discard logic.

It the chip has d-cache, then a small data set may be kept entirely in the cache, but a larger data set would have to be read from main memory more frequently due to cache misses.

User avatar
opossum
 
Posts: 636
Joined: Fri Oct 26, 2007 12:42 am

Re: Coldfire cache project

Post by opossum »

Zener wrote:Code must access external program DRAM as much as possible.
I think that may be true for unified cache (combined instruction and data cache).

User avatar
westfw
 
Posts: 2008
Joined: Fri Apr 27, 2007 1:01 pm

Re: Coldfire cache project

Post by westfw »

I have to write some code that would benefit a lot from the icache in the Coldfire.
I don't know what level of class you're in, or if there are "tricky" things about the coldfire iCache, but ... in general it is hard to write any kind of looping code that does NOT benefit significantly from the instruction cache being turned on. That is the point, after all; programmers shouldn't have to carefully craft their code, they should just be able to turn on the cache and have everything go faster...

The maximum advantage will occur when all the instructions you are executing are in the cache, and there are no other accesses to "slow" memory that occur (all you data is in the registers.) So, for example, a string copy benefits from the iCache because the code is all in memory, but since it keeps moving data from memory to memory, it doesn't benefit as much as it might.

The most likely "real world" example I can think of would be something like a bitwise CRC. Fetch a 32bit word into a register, and then do about 32 shifts and bittests and xors and such on it while you update a CRC value that is also in a register. It should be easy to get to something like 100 instructions executed for every memory fetch. You can find algorithms on the net (HDLC CRC or Ethernet CRC would be good) (avoid the byte-at-a-time algorithms that use a big data table. They probably end up faster overall, but they won't display the impressive speedup from the iCache.)

User avatar
zener
 
Posts: 4567
Joined: Sat Feb 21, 2009 2:38 am

Re: Coldfire cache project

Post by zener »

westfw wrote:
The most likely "real world" example I can think of would be something like a bitwise CRC. Fetch a 32bit word into a register, and then do about 32 shifts and bittests and xors and such on it while you update a CRC value that is also in a register. It should be easy to get to something like 100 instructions executed for every memory fetch. You can find algorithms on the net (HDLC CRC or Ethernet CRC would be good) (avoid the byte-at-a-time algorithms that use a big data table. They probably end up faster overall, but they won't display the impressive speedup from the iCache.)
I am a little fuzzy on this concept. It seems to me that would be using Sram a lot so that wouldn't have anything to do with the cache. I am thinking I just need maximum program memory fetches. BTW it has icache but no dcache. Thanks

User avatar
westfw
 
Posts: 2008
Joined: Fri Apr 27, 2007 1:01 pm

Re: Coldfire cache project

Post by westfw »

Easier to think about:
count the one bits in an array of longs.

The general idea is to execute a lot of instructions for each fetch from data memory. Presumably, to do anything interesting, you must fetch SOMETHING from data memory!

User avatar
zener
 
Posts: 4567
Joined: Sat Feb 21, 2009 2:38 am

Re: Coldfire cache project

Post by zener »

So when I said "fill it with a giant look up table" I was on the right track... although I left out the part about reading back from the lookup table.

User avatar
zener
 
Posts: 4567
Joined: Sat Feb 21, 2009 2:38 am

Re: Coldfire cache project

Post by zener »

Well I was able to get about a 2.2x difference (cache off/cache on) using the lookup table idea. My teacher said the highest delta he ever saw was 7x. Well a guy in my class "calculated the factorials out to 199 in one function, which was the only called once." So I guess it was recursive. Anyway, he claimed a difference of 111x. So I am not sure if that is even possible but the teacher accepted it.

Locked
Please be positive and constructive with your questions and comments.

Return to “Microcontrollers”