BBB (2G) seizing up

Moderators: adafruit_support_bill, adafruit

Please be positive and constructive with your questions and comments.
Locked
elfpen
 
Posts: 12
Joined: Mon Feb 11, 2013 1:12 pm

BBB (2G) seizing up

Post by elfpen »

I have a weather station running the wview software.
The BBB (2G) is installed in a weather proof box along with a solar charge controller,
battery and a DC-DC voltage converter.

After about a year of uptime, the BBB is now giving me problems.
It will run for a day or two then seize up. I go out and take the cover
off of the box and see that LEDs are still lit. The one closest to the
Ethernet connector is solid. The "heartbeat" is not beating.

I'm trying to figure out a way to get to the system status at failure time.
I think I may have to replace the BBB and bring that one in to the workbench
where I can try to replicate the situation.

Is it a flash write-cycle problem? That is my current suspicion.
I'm thinking a flash based system has a hard limit on how long it can run.

Is there a self-healing filesystem or other means around this?

Any ideas?

elfpen
 
Posts: 12
Joined: Mon Feb 11, 2013 1:12 pm

Re: BBB (2G) seizing up

Post by elfpen »

I forgot to add that the communication is via WiFi.

also, here's a picture (if the forum will allow it)

I notice that the LED which is on is the CPU. Problem is, I don't remember
if I took this picture before or after I rebooted the thing. I think it was
before, but not sure. (If I get to the same situation again I will
note that more carefully)

wstation.jpg
wstation.jpg (565.54 KiB) Viewed 645 times

User avatar
adafruit_support_mike
 
Posts: 67485
Joined: Thu Feb 11, 2010 2:51 pm

Re: BBB (2G) seizing up

Post by adafruit_support_mike »

Flash does have a limited lifespan, but anything that could trash the filesystem badly enough to hang the CPU would also make it impossible to boot the machine again.

The problem with complex systems is that there are too many things that can possibly go wrong.. you have to spend a lot of time narrowing and eliminating options before you start finding enough clues to lead to a solution.

One thing to check/eliminate is the power supply. Brownouts can do all sorts of crazy things to a machine.

You can also dig through the files in /var/log to see if those show any system complaining just before the system locks up.

elfpen
 
Posts: 12
Joined: Mon Feb 11, 2013 1:12 pm

Re: BBB (2G) seizing up

Post by elfpen »

I have two putty windows open to it, one running "dmesg -w" and the other "top"

I am hoping, if the communication quits I will be able to see the immediately prior state.

elfpen
 
Posts: 12
Joined: Mon Feb 11, 2013 1:12 pm

Re: BBB (2G) seizing up

Post by elfpen »

Well, it seized up again after 2 days, 7.5 hours and change.

The dmesg -w window did not show any further messages after the boot and network config stuff.
The "top" window did not show any problem with performance, memory, swapping, load average,
nothing unusual that I can see.

Supply voltage is shown on my weather station page and it was 13.9 volts
which is the input into my DC-DC converter. If the converter is working
that should be plenty to produce the needed 5 VDC for the BBB.

I am using a static IP address on the BBB wifi. The list of attached
devices on the router shows that the weather station is no longer talking
to the router. (this is about 3 hours since the choke according
to the timestamp on the "top" display)

In the morning I will open up the box and check the LEDs on the BBB
and get a picture before I disturb anything.
I will take my voltmeter and test the DC-DC converter output level.

User avatar
adafruit_support_mike
 
Posts: 67485
Joined: Thu Feb 11, 2010 2:51 pm

Re: BBB (2G) seizing up

Post by adafruit_support_mike »

Here's something else to try:

Code: Select all

 ( while true ; do touch try ; sleep 1 ; done ) &
Entering that on the command line will start a continuously-running process that updates the timestamp on a file named 'try' once per second. You can get the exact time with:

Code: Select all

ls -l --time-style=full-iso
Rebooting the system will kill that process, so the timestamp on 'try' will tell you the last second before reboot when the command executed.

As a large-scale debugging tool, it will tell you whether the whole OS locked up, or just the network connection went dead. If the OS did lock up, it will tell you the time of failure to within 1 second.

WRT the power, having an independent voltage monitor would be handy. It doesn't take much of a dip to freeze a system, and a BBB can't record its own power failure.

elfpen
 
Posts: 12
Joined: Mon Feb 11, 2013 1:12 pm

Re: BBB (2G) seizing up

Post by elfpen »

Ok, back from my trip up the ladder to the weather station BBB.

The picture I posted above is the same thing I saw this morning, the CPU LED is
solid and no heartbeat. Power LED is on. Power into my DC-DC converter
was over 13 volts and out of the converter to the BBB was 5.1 volts.

elfpen
 
Posts: 12
Joined: Mon Feb 11, 2013 1:12 pm

Re: BBB (2G) seizing up

Post by elfpen »

Didn't see your reply before mine, sorry

I like the "touch" idea!

I don't know enough about the purpose of the
CPU LED and how it works. If that is on solid, did the OS crash or
is there a hardware fault?

Going to start off by doing a full update of the Angstrom
OS (now that their site is back up). I may do another run-to-failure
to see if I can gather any more data, then shift to avoidance by
putting a daily reboot in cron.

elfpen
 
Posts: 12
Joined: Mon Feb 11, 2013 1:12 pm

Re: BBB (2G) seizing up

Post by elfpen »

Googling on "beaglebone black LED USR2 solid", I see that such a thing has been known to happen.

elfpen
 
Posts: 12
Joined: Mon Feb 11, 2013 1:12 pm

Re: BBB (2G) seizing up

Post by elfpen »

reading up about journalctl... pretty slick!

I see that there is a pattern in the journal. I get something like this:

Feb 27 02:02:20 beaglebone kernel: ath: phy0: Unable to remove station entry for: 00:24:b2:91:25:e2
Feb 27 02:02:26 beaglebone kernel: ------------[ cut here ]------------
Feb 27 02:02:26 beaglebone kernel: WARNING: at drivers/usb/musb/musb_host.c:125 musb_h_tx_flush_fifo+0x35/0x5c()
Feb 27 02:02:26 beaglebone kernel: Could not flush host TX2 fifo: csr: 2003


then a long stack trace making reference to things like "ieee80211" and "ath9k_htc..."

I will have a few more journal entries and then the halt.

So, I think there is some kind of problem related to the driver for the wifi dongle that I am using.

User avatar
adafruit_support_mike
 
Posts: 67485
Joined: Thu Feb 11, 2010 2:51 pm

Re: BBB (2G) seizing up

Post by adafruit_support_mike »

That sounds like a good lead.

For the heck of it, try adding a cron job that calls `ifdown` to drop the connection and then `ifup` to bring it back every so often.. once every 4-6 hours, for instance. It's kind of a brute-force solution to connection problems, but sometimes that works.

elfpen
 
Posts: 12
Joined: Mon Feb 11, 2013 1:12 pm

Re: BBB (2G) seizing up

Post by elfpen »

I added the ifdown/ifup cron job to run once daily.
And I did an opkg update/ opkg upgrade.

The system has been up for 4 days now, a great improvement!

User avatar
adafruit_support_mike
 
Posts: 67485
Joined: Thu Feb 11, 2010 2:51 pm

Re: BBB (2G) seizing up

Post by adafruit_support_mike »

Glad to hear it's working for you.

Locked
Please be positive and constructive with your questions and comments.

Return to “Beagle Bone & Adafruit Beagle Bone products”