0

MagTag Infrequent Failures
Moderators: adafruit_support_bill, adafruit

Please be positive and constructive with your questions and comments.

MagTag Infrequent Failures

by snaxter on Sun Jan 03, 2021 5:23 pm

I'm having problems with infrequent crashes on the MagTag that are similar to the problems I have with the Matrixportal (described here: viewtopic.php?f=60&t=172275&p=844238#p844238). I've tried this with several versions of Circuit Python, and now have the board running 6.1.0-beta.3 with libraries from *-bundle-6.x-mpy-20210101.

The code is pulling data from the Purple Air and Weather Underground web sites. It will run for a while (frequently hours) and then crash in a fetch_data() call. I have a try/except clause around the fetch_data(), but for some reason it only catches the error occasionally.

Here is the error:
Retrieving data...Traceback (most recent call last):
File "code.py", line 126, in <module>
File "code.py", line 122, in <module>
File "adafruit_portalbase/network.py", line 453, in fetch_data
File "adafruit_portalbase/network.py", line 425, in fetch
File "adafruit_requests.py", line 612, in get
File "adafruit_requests.py", line 576, in request
File "adafruit_requests.py", line 436, in _get_socket
OSError: 0


This is coming from this section of the code (full code attached):
Code: Select all | TOGGLE FULL SIZE
           print("***inside")
           fetch_success = False

           try:
               inside = NETWORK.fetch_data(
                   url=Inside_SOURCE,
                   json_path=(PM2_Location,TEMP_Location,HUMI_Location),)
               fetch_success = True
           except RuntimeError as e:
               print("Some error occurred, retrying! -", e)
               continue


I have seen the try/except actually catch the error on some iterations. For example, this will pop up every now and then:
***inside
Retrieving data...Some error occurred, retrying! - Sending request failed
Free Memory: 1,946,496
Iteration: 13


Any thoughts on ways that I can track this down?

I'll also file this as an issue as siddacious suggested in the original post.
Attachments
code.py
(5.94 KiB) Downloaded 6 times

snaxter
 
Posts: 25
Joined: Mon Mar 24, 2008 10:57 pm

Re: MagTag Infrequent Failures

by dastels on Sun Jan 03, 2021 6:29 pm

When dealing with networking you can expect regular errors. That's what the exception handling is for. In this case it looks like you should be catching OSError as well, doing the same report and continue that is done for RuntimeError.

Dave

dastels
 
Posts: 4850
Joined: Tue Oct 20, 2015 3:22 pm

Re: MagTag Infrequent Failures

by snaxter on Sun Jan 03, 2021 11:24 pm

Dave,

Thank you for the suggestion - that makes sense. I changed that section of code so that it looks like:

Code: Select all | TOGGLE FULL SIZE
           try:
               inside = NETWORK.fetch_data(
                   url=Inside_SOURCE,
                   json_path=(PM2_Location,TEMP_Location,HUMI_Location),)
               fetch_success = True
           except (RuntimeError, OSError) as e:
               print("Some error occurred, retrying! -", e)
               continue


And, as you would expect, it no longer crashes.

However, all subsequent calls to fetch_data() throw the OSError. I can see the loop iterating, and the call to time.localtime() continues to function properly and the time advances. However, with the persistent OSError, the display never gets updated with new data.

Do I need to reset the entire board on an OSError?

snaxter
 
Posts: 25
Joined: Mon Mar 24, 2008 10:57 pm

Re: MagTag Infrequent Failures

by dastels on Mon Jan 04, 2021 10:07 am

You shouldn't need to reset. It sounds like the network layer is getting stuck in a bad state due to the first OSError, which it never recovers from. I'm digging into the code now to see what's happening.

Dave

dastels
 
Posts: 4850
Joined: Tue Oct 20, 2015 3:22 pm

Re: MagTag Infrequent Failures

by dastels on Mon Jan 04, 2021 10:50 am

The OSError... does it happen at the same place every time?

Dave

dastels
 
Posts: 4850
Joined: Tue Oct 20, 2015 3:22 pm

Re: MagTag Infrequent Failures

by dastels on Mon Jan 04, 2021 11:23 am

Maybe add these prints for more information:
Code: Select all | TOGGLE FULL SIZE
print(type(e))
print(e.args)
print(e)

dastels
 
Posts: 4850
Joined: Tue Oct 20, 2015 3:22 pm

Re: MagTag Infrequent Failures

by snaxter on Mon Jan 04, 2021 1:06 pm

Hi Dave,

Yes, it looks like the OSError is happening at the same place every time. I've included a snippet below that includes the output of the print statements that you suggested. In this particular case, it ran for just over 30 minutes without any errors and then gets stuck once the OSError occurs.

Free Memory: 1,946,528
Iteration: 33
Time: 2021 - 1 - 4 8 : 57
***inside
Retrieving data...Error occurred, retrying! -
<class 'OSError'>
(0,)
0
Free Memory: 1,946,528
Iteration: 34
Time: 2021 - 1 - 4 8 : 58
***inside
Retrieving data...Error occurred, retrying! -
<class 'OSError'>
(0,)
0
Free Memory: 1,946,528
Iteration: 35
Time: 2021 - 1 - 4 8 : 59
***inside
Retrieving data...Error occurred, retrying! -
<class 'OSError'>
(0,)
0

snaxter
 
Posts: 25
Joined: Mon Mar 24, 2008 10:57 pm

Re: MagTag Infrequent Failures

by dastels on Tue Jan 05, 2021 2:23 pm

I would look at shutting down and restarting the connection. It does seem to get stuck.

Dave

dastels
 
Posts: 4850
Joined: Tue Oct 20, 2015 3:22 pm

Re: MagTag Infrequent Failures

by snaxter on Tue Jan 05, 2021 6:21 pm

Thanks for the suggestion. I looked through the code for an easy way of shutting down and restarting.

On the MagTag, should I do a exit_and_deep_sleep() for a short period of time? I'd appreciate any other suggestions.

snaxter
 
Posts: 25
Joined: Mon Mar 24, 2008 10:57 pm

Re: MagTag Infrequent Failures

by tannewt on Tue Jan 05, 2021 6:26 pm

exit_and_deep_sleep should work. There is also `microcontroller.reset()`: https://circuitpython.readthedocs.io/en ... ller.reset

tannewt
 
Posts: 2034
Joined: Thu Oct 06, 2016 8:48 pm

Re: MagTag Infrequent Failures

by snaxter on Sat Jan 09, 2021 12:03 am

I tried implementing a three-strike counter where I'll force an exit or reset after detecting 3 sequential errors without a successful connection. The exit_and_deep_sleep() did not always restart successfully, so I tried the microcontroller.reset(). That is generating a crash into the HardFault_Handler. While I'm still struggling with the try/except clauses, I think this is a fundamental HW fault that I can't catch with an except (?).

Is there a chance that I have a flakey board?
Attachments
code.py
(7.37 KiB) Downloaded 2 times
magtag crash.jpg
magtag crash.jpg (474.04 KiB) Viewed 125 times

snaxter
 
Posts: 25
Joined: Mon Mar 24, 2008 10:57 pm

Re: MagTag Infrequent Failures

by tannewt on Mon Jan 11, 2021 6:38 pm

snaxter wrote:That is generating a crash into the HardFault_Handler.

This seems like a bug. Please file an issue: https://github.com/adafruit/circuitpython/issues/new

snaxter wrote:I think this is a fundamental HW fault that I can't catch with an except (?).

You should always be able to catch an exception. That doesn't mean that the underlying code is ok though. It's possible there is a lower level bug.

snaxter wrote:Is there a chance that I have a flakey board?


This is very unlikely. Networking is inherently unreliable and has many layers of software to overcome that. Unfortunately getting the software right is really hard because everyone's network is a little different.

tannewt
 
Posts: 2034
Joined: Thu Oct 06, 2016 8:48 pm

Re: MagTag Infrequent Failures

by ryanconley on Fri Jan 15, 2021 4:37 am

I seem to be having the same issues (though mine are frequent) running the MagTag Google Calendar Event Display (https://learn.adafruit.com/magtag-googl ... nt-display). Even the short "Getting The Date & Time" test code is generating errors 70% of the time:

Code: Select all | TOGGLE FULL SIZE
(WiFi scan results removed - I live in an RF swamp)
My IP address is 192.168.1.128
Ping google.com: 0.045000 ms
Fetching text from https://io.adafruit.com/api/v2/****/integrations/time/strftime?x-aio-key=**************&fmt=%25Y-%25m-%25d+%25H%3A%25M%3A%25S.%25L+%25j+%25u+%25z+%25Z
Traceback (most recent call last):
  File "code.py", line 49, in <module>
  File "adafruit_requests.py", line 612, in get
  File "adafruit_requests.py", line 576, in request
  File "adafruit_requests.py", line 436, in _get_socket
OSError: 0


It occasionally works, but I have yet to get the full final Google Calendar code running. Is this an issue with the CircuitPython 6.1.0-rc.1 ? :-(

ryanconley
 
Posts: 1
Joined: Thu Feb 20, 2020 3:52 am

Please be positive and constructive with your questions and comments.