0

Assembly Language on the Metro M0 and M4
Moderators: adafruit_support_bill, adafruit

Please be positive and constructive with your questions and comments.

Assembly Language on the Metro M0 and M4

by charley468 on Mon Jan 21, 2019 12:22 am

I need to mix some C code with assembly and I cannot find any instructions how to do that.
I have asmb code that works on the UNO but, of course, does not work on the SAMD21 or SAMD51. I understand that.
Is there a way to put in-line asmb in C?
In some compilers an "asm" compiler directive allows inline code. How is that done with the -21 and -51?

I used the Ardunio IDE to run some test C code and I found that the M4 was slower then the M0 to do
A = (sin(y) + cos(z)) ;
obviously I am not enabling the hardware DSP or floating point. How do I do that? Or is the IDE -51 C compiler not set up to use the built-in hardware?

Maybe a different IDE?

I am excited about using the M4! An order of magnitude faster then the UNO?!! Very nice!

charley468
 
Posts: 4
Joined: Thu Oct 18, 2018 2:51 pm

Re: Assembly Language on the Metro M0 and M4

by franklin97355 on Mon Jan 21, 2019 1:51 am

Have you looked at Atmel Studio 7?

franklin97355
 
Posts: 20514
Joined: Mon Apr 21, 2008 2:33 pm
Location: Lacomb, OR.

Re: Assembly Language on the Metro M0 and M4

by westfw on Tue Jan 22, 2019 1:48 am

Inline ASM for D51 works the same as inline ASM for AVR (although since everything is in the same memory address space, presumably you don't need any of the special argument identifiers for IO Ports or etc...)

There shouldn't actually be much cause to use ASM on an ARM, though. All the funny instructions have intrinsic C functions to do the same thing, the compiler is quite good, and the few cases where it might make sense to use ASM on an 8bit chip are less likely to be applicable on the 32bit ARM...


I used the Ardunio IDE to run some test C code and I found that the M4 was slower then the M0 to do
A = (sin(y) + cos(z)) ;
obviously I am not enabling the hardware DSP or floating point.

As a 32bit ARM, "sin" will default to requiring a "double" argument, which adds float-to-double conversion, and prevents the (single-precision) hardware from being of any use. (although, this should be the case on the M0 as well.)

Use "sinf()", "cosf()", and etc, and make sure all your floating point constants are cast as single-precision floats (3.14f instead of 3.14), and you should see significant improvement.

(I'm very surprised that the M4 is showing slower than M0; the M0 doesn't even have optimized float subroutines - can you post your test code?)
westfw
 
Posts: 1522
Joined: Fri Apr 27, 2007 1:01 pm
Location: SF Bay area

Re: Assembly Language on the Metro M0 and M4

by charley468 on Wed Jan 23, 2019 1:13 am

Thank you for the reply.

void setup()
{
// initialize digital pin LED_BUILTIN as an output.
pinMode(LED_BUILTIN, OUTPUT);
}
volatile float result = 3.1 ;
volatile float angle = 45 ;
void loop()
{
digitalWrite(LED_BUILTIN, HIGH); // turn the LED on
digitalWrite(LED_BUILTIN, LOW); // turn the LED off
result = sinf(angle);
}

Pretty simple
M0 toggle w/o sin() =3.5us
M0 toggle w sin(..) = 219us
M0 toggle w sinf(..) = 106us

M4 toggle w/o sin() = 920ns
M4 toggle w/ sin(..) = 22.6us
M4 toggle w/ sinf(..) = 2.5us (quite an improvement)

just for reference:
Metro toggle w/o sin() = 7.56 us
Metro toggle w/o sin() but with:
PORTB = 0x0FF ;
PORTB = 0x00;
=> Metro w/ ASM toggle = 375 ns (20x faster)
Metro toggle w/ sin(..) = 120us
Metro toggle w/ sinf(..) = 120us

(I was taking these measurements when I should have been in bed... so, maybe not quite correct?)

Other measurements:
It appears that the M0 toggle in CircuitPython takes 99us and the M4 toggle with CircuitPython takes 36us

RPi 3B+ just toggle in C takes 77ns, in Python takes 3.63us, (47x slower) sin(...) takes 148ns

I am also looking at Studio 7... Thank you for the tip-off

Thank you,
Charley

charley468
 
Posts: 4
Joined: Thu Oct 18, 2018 2:51 pm

Re: Assembly Language on the Metro M0 and M4

by westfw on Wed Jan 23, 2019 2:58 am

volatile float angle = 45 ;

Don't forget that sin/etc take an argument expressed in radians (shouldn't matter for a speed test.)


M0 toggle w sin(..) = 219us
M4 toggle w/ sin(..) = 22.6us

So that confirms that the M4 was faster than M0 even without the HW float support?


PORTB = 0x0FF ;
PORTB = 0x00;
=> Metro w/ ASM toggle = 375 ns (20x faster)

It's not clear whether you understand that this is NOT an "ASM toggle"; it's just "direct port writes" in C.
The equivalent for M0/M4 would look something like:
Code: Select all | TOGGLE FULL SIZE
   PORT->Group[g_APinDescription[LED_BUILTIN].ulPort].OUTCLR = 1ul << g_APinDescription[LED_BUILTIN].ulPin;

(I think that nasty-looking structure referencing ends up happening at compile time.)

See also viewtopic.php?f=57&t=133497#p668379
westfw
 
Posts: 1522
Joined: Fri Apr 27, 2007 1:01 pm
Location: SF Bay area

Re: Assembly Language on the Metro M0 and M4

by westfw on Wed Jan 23, 2019 3:10 am

See also https://www.quinapalus.com/qfplib.html if you're interested in a faster/smaller alternative float library for M0.
westfw
 
Posts: 1522
Joined: Fri Apr 27, 2007 1:01 pm
Location: SF Bay area

Re: Assembly Language on the Metro M0 and M4

by charley468 on Wed Jan 23, 2019 9:07 am

thanks again!

Yes, I realize that is a rather unfair test - poking right to the port. I have written lots of assembly
code back in the day (Z-80, 6909, 6502, bit-slice machine). In fact some is still flying around today!

No, I am not planning on writing "sin()" in asmb... but what I did want to speed up was the read of
data from an A/D. I am planning on building a parallel load A/D then feed that to the sin() routine.
Any time I can save doing reads and writes to hardware gives more time for the sin().
I want a 20 khz cycle time => 50us. Can't do much in that amount of time in python...

I will follow up with the links

charley468
 
Posts: 4
Joined: Thu Oct 18, 2018 2:51 pm

Re: Assembly Language on the Metro M0 and M4

by westfw on Thu Jan 24, 2019 2:59 am

Did you see https://github.com/adafruit/ArduinoCore-samd/issues/51 ? The M4 A/D was sped up a great deal, recently.
The speed of A/D was not/is not limited by being written in C (not yet, anyway)...

Umm. Bit-slices! For sure you get a good understanding of the ARM instruction set (pre-thumb) with that background!
Attachments
DSCN7199.jpg
DSCN7199.jpg (90.63 KiB) Viewed 74 times
westfw
 
Posts: 1522
Joined: Fri Apr 27, 2007 1:01 pm
Location: SF Bay area

Re: Assembly Language on the Metro M0 and M4

by charley468 on Thu Jan 24, 2019 11:45 am

Yeah, 60us to read one analog is not exactly fast...
but I like:
it is now set to 1/ ( 1/(120MHz/16) ) * (30 + 1) = 241,935 Samples/Sec

Now that will do what I need!
I need 4 analog reads, several sin(), cos() calculations and two analog outputs in 50us (20khz), so 60us to make one read does not quite make it.

charley468
 
Posts: 4
Joined: Thu Oct 18, 2018 2:51 pm

Please be positive and constructive with your questions and comments.