While I was waiting for my SpokePOV to come in the mail, I was looking at the firmware to see how spiffy I could could tweak it. So I load up the code in AVR Studio and what do I find for the code size:
- Code: Select all
Program: 1966 bytes (96.0% Full)
Here was the starting flags:
- Code: Select all
-Wall -gdwarf-2 -std=gnu99 -Os -funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -DF_CPU=8000000UL
Wow, that is not much to work with. So I started looking around for some size optimizations I could add to squeeze some more code in.
I looked around and found a list of size optimizations here.
Results in order applied (I did not attempt to test every configuration with every other configuration):
- Added -Wl,--relax: 1964 bytes (95.9% Full)
- Added -ffreestanding and void main() __attribute__ ((noreturn)); : 1942 bytes (94.8% Full)
Dropped 22 bytes! - Added -fno-tree-scev-cprop: no effect still 1942 bytes (94.8% Full)
- Added -ffunction-sections, -fdata-sections, -Wl,--gc-sections: Program: 1910 bytes (93.3% Full)
Dropped 32 bytes!
Wow, a ~2.8% reduction in byte count for what should not effect the running code (have not tested, but the largest drops are from removing the main stack return code and dead code removal, two "safe" operations).
At this point I started looking in the code for repeated structures. I found the "NOP; NOP; NOP; NOP;" (or NOP4 for short) blocks present throughout the code, with the relatively consistent comment "// wait 500 ns". The delay comes from wait 4 cpu cycles before starting the next operation. So in theory any operation that takes 4 cpu cycles without changing the state of the CPU is an equivalent replacement. Looking at the source listing the NOP4 code takes 8 bytes each time it is used (2 bytes per NOP * 4). So as long as I can find an equivalent timing operation that has no effect on the state that is less than 8 bytes is a net gain per use.
SpokePOV uses 9 NOP4, or 80 bytes for timing code the does nothing else.
My first step was to replace "NOP; NOP; NOP; NOP;" with "wait500ns();" where "wait500ns();" is
- Code: Select all
#define wait500ns() NOP; NOP; NOP; NOP;
This allows me to change one place and every instance of NOP4.
Looking at the instruction chart located here, I pick found an instruction that could work, RJMP. RJMP takes the PC (program counter) and changes it too PC= PC + k + 1. It also happens to take two cpu cycles. During non-jump operations, the PC is incremented by one each clock cycle (if you ignore interrupts and the like). The key here is an "rjmp +.0" is a NOP that takes two cycles! It also happens to be encoded in 2 bytes, so wait500ns() can be redefined to:
- Code: Select all
#define JMP_P1 asm volatile("rjmp .+0");
#define wait500ns() JMP_P1; JMP_P1;
Resulting in:
- Code: Select all
Program: 1874 bytes (91.5% Full)
That's all I found in my first pass.
tl;dr Final results: 92 bytes (~46 instructions) recovered while learning avr-gcc flags and AVR ASM with hopefully no effect on execution.
Anyone else can squeeze the SpokePOV firmware anymore?

