Easy way to checksum/hash a file from REPL?

CircuitPython on hardware including Adafruit's boards, and CircuitPython libraries using Blinka on host computers.

Moderators: adafruit_support_bill, adafruit

Please be positive and constructive with your questions and comments.
Locked
User avatar
kevinjwalters
 
Posts: 1025
Joined: Sun Oct 01, 2017 3:15 pm

Easy way to checksum/hash a file from REPL?

Post by kevinjwalters »

Two or three times (i.e probably once a year) I've noticed I have copied files using Windows explorer to a board and it's given an error on a very early line of code, possibly with this message:

Code: Select all

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Traceback (most recent call last):
  File "code.py", line 1
SyntaxError: invalid syntax
That jumps out as strange because the first few lines are often just comments and imports and unlikely to be incorrect unless there's been a file editing blunder.

The last time it happened I knew the code worked because it was working on another identical board. I did a SHA-256 from powershell and the source file and the one on the board were the same. When I recopied the file it worked. I didn't think about it at the time but I should have checked from the board's point of view too and not just the host o/s.

Is there an easy way from REPL of using CircuitPython to do a checksum or hash of a file on the CIRCUITPY drive?

User avatar
tannewt
 
Posts: 3304
Joined: Thu Oct 06, 2016 8:48 pm

Re: Easy way to checksum/hash a file from REPL?

Post by tannewt »

I don't know of a way to do a checksum or hash from CP.

This kind of SyntaxError can happen when the host OS hasn't actually written the full file back to CP. Windows is particularly bad about this. You can poke it by making it "safe to remove" which should hopefully cause a flush.

User avatar
kevinjwalters
 
Posts: 1025
Joined: Sun Oct 01, 2017 3:15 pm

Re: Easy way to checksum/hash a file from REPL?

Post by kevinjwalters »

I have just copied a file to three boards and only two picked it up. From the Windows host I have:

Code: Select all

PS C:\> Get-FileHash E:\code.py -Algorithm SHA256

Algorithm       Hash                                                                   Path
---------       ----                                                                   ----
SHA256          64AE73DFC4CCF42D86C94118A86E6932E197459F7DACE1D00EDAF970F1EE1492       E:\code.py

PS C:\> Get-FileHash F:\code.py -Algorithm SHA256

Algorithm       Hash                                                                   Path
---------       ----                                                                   ----
SHA256          64AE73DFC4CCF42D86C94118A86E6932E197459F7DACE1D00EDAF970F1EE1492       F:\code.py

PS C:\> Get-FileHash G:\code.py -Algorithm SHA256

Algorithm       Hash                                                                   Path
---------       ----                                                                   ----
SHA256          64AE73DFC4CCF42D86C94118A86E6932E197459F7DACE1D00EDAF970F1EE1492       G:\code.py
But I get a repeatable syntax error on one of the boards. I've just improvised the world's simplest checksum.

GOOD:

Code: Select all

>>> with open("code.py", "rb") as file_bin:
...     filebindata = file_bin.read()
...
...
...
>>> total = 0
>>> for b in filebindata: total += b
...
>>> total
1891982
>>> len(filebindata)
24247
BAD (NB: total is different by two digits):

Code: Select all

>>> with open("code.py", "rb") as file_bin:
...     filebindata = file_bin.read()
...
...
...
>>> total = 0
>>> for b in filebindata: total += b
...
>>> total
1893582
>>> len(filebindata)
24247
Something has mashed up at least one line, 613 is the syntax error:

GOOD (this is what it looks like on the host):

Code: Select all

>>> file_byline[613-1]
'                          "{:d} and RpsKeyDataAdvertisement {:d}".format(len(cipher_ads), len(key_ads)))\n'
>>> file_byline[614-1]
'            except KeyError:\n'
>>> file_byline[615-1]
'                pass\n'
>>> file_byline[616-1]
'            player_choices.append(opponent_choice)\n'

BAD:

Code: Select all

>>> file_byline[613-1]
'                          "{:d} and RpsKeyDataAdvertisement {:d}".format(len(cipher_ads), len(key_     player_choices.append(opponent_choice)\n'
>>> file_byline[614-1]
'        pass\n'
>>> file_byline[615-1]
'       p wins ayer_choices.append(opponent_choice)\n'
>>> file_byline[616-1]
'\n'
ADDED LATER: I pulled the file off the device by cutting and pasting the bytes printed on serial console and re-creating the corrupted code.py on another machine. Here's the diff, the area of damage is a very small portion.

Code: Select all

$ diff -c actual-code.py corrupted-code.py
*** actual-code.py      2020-06-01 15:58:06.804888912 +0100
--- corrupted-code.py   2020-06-01 16:49:41.767492516 +0100
***************
*** 610,619 ****
                                round, round_msg1, round_msg2)
                  else:
                      print("Wrong number of RpsEncDataAdvertisement "
!                           "{:d} and RpsKeyDataAdvertisement {:d}".format(len(cipher_ads), len(key_ads)))
!             except KeyError:
!                 pass
!             player_choices.append(opponent_choice)

          ### Chalk up wins and losses
          for p_idx1, player in enumerate(players[1:], 1):
--- 610,618 ----
                                round, round_msg1, round_msg2)
                  else:
                      print("Wrong number of RpsEncDataAdvertisement "
!                           "{:d} and RpsKeyDataAdvertisement {:d}".format(len(cipher_ads), len(key_     player_choices.append(opponent_choice)
!         pass
!        p wins ayer_choices.append(opponent_choice)

          ### Chalk up wins and losses
          for p_idx1, player in enumerate(players[1:], 1):
It looks like there's one damaged chunk starting on a 128 byte boundary (could be finer grain than that):

Code: Select all

$ for skip in {182..184}
> do
>   echo ACTUAL ${skip}
>   dd if=actual-code.py bs=128 skip=${skip} count=1 2> /dev/null ; echo
>   echo CORRUPTED ${skip}
>   dd if=corrupted-code.py bs=128 skip=${skip} count=1 2> /dev/null ; echo
>   echo --------
> done
ACTUAL 182
 of RpsEncDataAdvertisement "
                          "{:d} and RpsKeyDataAdvertisement {:d}".format(len(cipher_ads), len(key_
CORRUPTED 182
 of RpsEncDataAdvertisement "
                          "{:d} and RpsKeyDataAdvertisement {:d}".format(len(cipher_ads), len(key_
--------
ACTUAL 183
ads)))
            except KeyError:
                pass
            player_choices.append(opponent_choice)

        ### Chalk u
CORRUPTED 183
     player_choices.append(opponent_choice)
        pass
       p wins ayer_choices.append(opponent_choice)

        ### Chalk u
--------
ACTUAL 184
p wins and losses
        for p_idx1, player in enumerate(players[1:], 1):
            (win, draw, void) = evaluateGame(my_choic
CORRUPTED 184
p wins and losses
        for p_idx1, player in enumerate(players[1:], 1):
            (win, draw, void) = evaluateGame(my_choic
--------

User avatar
kevinjwalters
 
Posts: 1025
Joined: Sun Oct 01, 2017 3:15 pm

Re: Easy way to checksum/hash a file from REPL?

Post by kevinjwalters »

Here's a comparison of the broken part of the file showing how the real file on the left maps to CircuitPython's view of code.py on CIRCUITPY:
Comparison mapping 8 byte chunks with same content between actual vs corrupt file
Comparison mapping 8 byte chunks with same content between actual vs corrupt file
actual-vs-corrupted-mapping-bs8-23408-23560.png (253.49 KiB) Viewed 126 times

User avatar
kevinjwalters
 
Posts: 1025
Joined: Sun Oct 01, 2017 3:15 pm

Re: Easy way to checksum/hash a file from REPL?

Post by kevinjwalters »

And to complete the checks I just tried eject, then reset button, then power cycle with usb disconnect. I checked the file each time by totalling the bytes and it remains unchanged in its corrupted state. chkdsk from the host says everything is clean.

User avatar
kevinjwalters
 
Posts: 1025
Joined: Sun Oct 01, 2017 3:15 pm

Re: Easy way to checksum/hash a file from REPL?

Post by kevinjwalters »

File Save is Corrupting code.py sounds like same issue but on a later version of Windows and way more frequent than mine.

Locked
Please be positive and constructive with your questions and comments.

Return to “Adafruit CircuitPython”