subscribe request flood diagnosis

Moderators: adafruit_support_bill, adafruit

Forum rules
If you're posting code, please make sure your code does not include your Adafruit IO Active Key or WiFi network credentials.
User avatar
mharsch
 
Posts: 2
Joined: Mon Mar 13, 2017 11:29 am

subscribe request flood diagnosis

Post by mharsch »

I'm using Node-RED mqtt nodes and am hitting an account ban when a machine initializes. I'm trying to narrow down the exact error rate that is triggering the ban. The first error of the sequence that shows up on the username/errors endpoint is:

nodered_1cc1b63c001b1487 174.231.85.6 SUBSCRIBE mharsch/feeds/walnutcreeknew.mixer4-vfd-reset-signal-on request flood, disconnecting

Is this enough to determine which of the following documented limits is triggering the ban?

1.) > 260 publish actions per minute
2.) > 100 subscribe requests per minute
3.) > 10 FAILED subscribe requests per minute
4.) > 10 FAILED publish requests per minute
5.) "enough" messages after passing the rate limit
6.) > 20 log attempts per minute

User avatar
T_Mo
 
Posts: 1423
Joined: Thu Mar 15, 2018 7:10 pm

Re: subscribe request flood diagnosis

Post by T_Mo »

(Community member)
How many feeds are you communicating with?

User avatar
kd0ycl
 
Posts: 7
Joined: Thu Jul 14, 2022 5:46 pm

Re: subscribe request flood diagnosis

Post by kd0ycl »

T_Mo wrote: Thu Jan 16, 2025 6:46 pm (Community member)
How many feeds are you communicating with?
For the machine that is triggering the ban: 112 total. 45 of which are input nodes that subscribe at startup.

User avatar
T_Mo
 
Posts: 1423
Joined: Thu Mar 15, 2018 7:10 pm

Re: subscribe request flood diagnosis

Post by T_Mo »

That's a lot.

User avatar
mharsch
 
Posts: 2
Joined: Mon Mar 13, 2017 11:29 am

Re: subscribe request flood diagnosis

Post by mharsch »

To clarify (per advice from danh on discord), I'm a AIO+ subscriber with 260 data/min boost.

I'm pretty sure that I'm not exceeding the data (publish) rate, but whichever rate is getting triggered causes a cascading failure mode that affects the whole account (many more devices). This issue is causing quite a bit of pain for me, as every experiment / failure takes down my whole fleet for several minutes (up to 1 hr).

I'm hoping to isolate the trigger of the temporary bans and work to fix the root cause. In the meantime, it would be very nice if I could get a bump in subscriptions/minute allowance to ease the pain.

User avatar
tyeth
 
Posts: 119
Joined: Sat Jun 28, 2014 8:48 pm

Re: subscribe request flood diagnosis

Post by tyeth »

Hey, there could be a couple of things going on here, but to help track it down a bit first it's worth asking if NodeRED is disconnecting and reconnecting between each data request (mine did). There is an authentication anti-hammer limit which may be the root cause.

To find out the first question you can look at the IO overview page at https://io.adafruit.com/overview (or click the adafruit logo from an IO page) then you will see the connection and disconnect events (only successfully connects not attempts) and the data points arrive.
Be aware the page reflows the layout once enough data arrives.

Secondly there is a new method in the library that will fetch your current rate limit information:
https://docs.circuitpython.org/projects ... _rate_info

If not using circuitpython or wishing to access the info an alternative way, then the API endpoint is https://io.adafruit.com/api/v2/{{io_user}}/throttle and you'll need to replace the {{io_user}} with username, and send the HTTP Header for the IO Key, like in the docs: https://io.adafruit.com/api/docs/#get-user-info

Expect to see data like this:

Code: Select all

{
    "data_rate_limit": 180,
    "active_data_rate": 74,
    "authentication_rate": 0,
    "subscribe_authorization_rate": 0,
    "publish_authorization_rate": 0,
    "hourly_ban_rate": 0,
    "mqtt_ban_error_message": null,
    "active_sms_rate": 0
}
I've never seen some of those numbers change, so don't be surprised if the returned values don't match expected values. (The API endpoint may need fixing, or represents not what you think)

Regarding your setup, how many other clients are connecting/connected to publish/subscribe to the >100 feeds? I will probably attempt to recreate a similar setup to diagnose the issue if there is no easier way.


Also, just to manage your future expectations regarding support, we (IO team) check support emails (raised via the feedback button on website or emailed) a few times per week (daily basically), and the forums a bit more randomly (in my case, but Brent's really good at checking the forums).

Usually someone not on the IO+ team spots anything worth escalating from the forums or discord and let's us know (if we haven't already joined the conversation).

My discord checking is even more sporadic (and voluntary), and the server is so busy I have all channels muted except the #help-with-wippersnapper-and-adafruitIO channel, preferring to browse the others when I actively have time to help.
Nothing wrong with asking for help in general chat channel, but may get lost in the noise, better to go for a subject specific help channel.

User avatar
kd0ycl
 
Posts: 7
Joined: Thu Jul 14, 2022 5:46 pm

Re: subscribe request flood diagnosis

Post by kd0ycl »

Thanks for engaging with this issue. I'm able to reproduce now using another AIO+ account, so we should be able to make quicker progress without the consequences of triggering bans on my main account.

To answer your first question, I'm not seeing disconnect/connect events for each data request. I'm seeing normal disconnect/connection messages when NodeRED "Deploys" which is expected. Here are 2 deploy events 4 minutes apart:

Code: Select all

2025/01/17 07:32:38AM nodered_bf727cae2fb5e23a connected
2025/01/17 07:32:37AM nodered_feb58b9943a9d13f disconnected 98.43.106.225
2025/01/17 07:28:22AM nodered_feb58b9943a9d13f connected
2025/01/17 07:28:22AM nodered_5c609c69f064666d disconnected 98.43.106.225
2025/01/17 07:28:13AM io-browser-ccb9de04
Here's the signature failure sequence (subscribe request flood) followed by temporary ban.

Code: Select all

2025/01/17 07:36AM io-browser-0e85ebc0 98.43.106.225 SUBSCRIBE to topic `kd0ycl/clients` rejected, user is temporarily blocked
2025/01/17 07:36AM enforcement limit reached, your account is banned for 60 seconds
2025/01/17 07:36AM enforcement limit reached, your account is banned for 30 seconds
2025/01/17 07:36AM nodered_ff4f9a94691d40f8 98.43.106.225 SUBSCRIBE kd0ycl/f/f45 request flood, disconnecting
The NodeRED config that is reproducing the issue here has 60 total mqtt nodes across 3 flows. Half are mqtt in and half are mqtt out. Each node points to a different feed.

User avatar
jwcooper
 
Posts: 1036
Joined: Tue May 01, 2012 9:08 pm

Re: subscribe request flood diagnosis

Post by jwcooper »

This is a really interesting issue and took a bit of debugging to find the root cause.

The core of the issue is that our system is considering your current node-red setup a denial of service attack. The reasoning behind this is because of how node-red is firing every single subscribe (and probably publish?) request instantly on connection.

We can handle this capacity easily once you actually make it through the pub/subs, but because they are coming in nearly instantly, we consider this an attack on our system.

We are looking at a longer term fix here, but because this is happening so high in our stack (basically before we even look at if your account has the ability to go this fast, which you do) we have to be careful as any changes could be costly in terms of stability in the case of attacks.

I'm not too versed in node-red, but a 'quick' fix on your end would just be applying the smallest of a delay in the setup of the subscribes and publishes. It doesn't have to be seconds, like even 2-5ms between each pub/sub would do it.

Is this possible to do in node-red?

User avatar
kd0ycl
 
Posts: 7
Joined: Thu Jul 14, 2022 5:46 pm

Re: subscribe request flood diagnosis

Post by kd0ycl »

Correction: as you can see from the attachment, the number of nodes to reach failure is 27 not 47.

Ok, I've simplified the failing test down to the following: a single flow with 47 mqtt in nodes (each pointing to a unique feed). If you remove just one (down to 46) it will deploy fine. Adding one more for a total of 47 triggers a ban for several minutes.

There's a related issue here that makes debugging quite frustrating: the ban affects the user's ability to get updates from username/errors and username/throttle feeds.
Attachments
fail.PNG
fail.PNG (145.6 KiB) Viewed 48 times
Last edited by kd0ycl on Fri Jan 17, 2025 1:02 pm, edited 1 time in total.

User avatar
jwcooper
 
Posts: 1036
Joined: Tue May 01, 2012 9:08 pm

Re: subscribe request flood diagnosis

Post by jwcooper »

Yea, that makes sense. We can process N number of requests very quickly, and based on a users' latency, that N is likely variable.

We're looking at modifying this yet, but a short delay should get you past this.

Can you chain the requests? So one fires right after another instead of all at once on connect?

User avatar
kd0ycl
 
Posts: 7
Joined: Thu Jul 14, 2022 5:46 pm

Re: subscribe request flood diagnosis

Post by kd0ycl »

jwcooper wrote: Fri Jan 17, 2025 12:29 pm I'm not too versed in node-red, but a 'quick' fix on your end would just be applying the smallest of a delay in the setup of the subscribes and publishes. It doesn't have to be seconds, like even 2-5ms between each pub/sub would do it.

Is this possible to do in node-red?
I asked this question to the NodeRED forum here:
https://discourse.nodered.org/t/throttl ... rtup/94660

The suggestion of using the mqtt-in nodes in Dynamic mode does allow for delaying subscriptions, and I'm experimenting with that now, however we're still triggering these bans on the production account despite spacing out subscriptions into groups separated by several seconds.

I'll see if I can get a dynamic mode workaround demo to fail using my test account.

User avatar
jwcooper
 
Posts: 1036
Joined: Tue May 01, 2012 9:08 pm

Re: subscribe request flood diagnosis

Post by jwcooper »

however we're still triggering these bans on the production account despite spacing out subscriptions into groups separated by several seconds.
Publishes could also impact this as well. There are two separate flood queues that we use. One is for publish, and one is for subscribe.

Also, we are still looking at a more permanent solution here. These are just suggestions to work around the issue in the meantime.

User avatar
kd0ycl
 
Posts: 7
Joined: Thu Jul 14, 2022 5:46 pm

Re: subscribe request flood diagnosis

Post by kd0ycl »

I did a test with just mqtt-out nodes. No connected flows, so no publish messages should be sent. I got up to 100 configured mqtt-out nodes with no trigger. Adding 26 mqtt-in nodes also worked, but adding the 27th caused the trigger.

User avatar
tyeth
 
Posts: 119
Joined: Sat Jun 28, 2014 8:48 pm

Re: subscribe request flood diagnosis

Post by tyeth »

Just to add the same information here for future readers...

MQTT supports wildcards, so you can have a single subscription for multiple topics that way.

In other words you may also be able to change your multiple MQTT subscription topics to instead use a single subscription with wildcards. 

i.e. to have a single subscribe for mharsch/f/# covers all feeds 
(# is the wildcard symbol in mqtt, so when testing I subscribe to tyeth/# for example which picks up everything)

User avatar
kd0ycl
 
Posts: 7
Joined: Thu Jul 14, 2022 5:46 pm

Re: subscribe request flood diagnosis

Post by kd0ycl »

I've identified another variant of this issue (NodeRED mqtt-in nodes triggering subscribe flood) that can't be mitigated by delaying subscribe actions at startup (using mqtt-in dynamic mode + delay).

We're hitting a case where something (other than deploy) is causing the NodeRED broker connection to disconnect, then reconnect. We think we're triggering this by interacting with a IO dashboard. Whatever the trigger is, the behavior of the NodeRED automatic reconnect is fatal, presumably because it's resubscribing anything that was subscribed at the time of disconnect. So, despite avoiding the subscribe flood at deploy time, we're still vulnerable to the account ban failure mode if anything causes a broker disconnect (on a system with 27 mqtt-in nodes). I have a simple NodeRED flow on my test account that can simulate this case and reproduce the issue at will.

I'm going to experiment this morning using wildcards to reduce the number of mqtt-in nodes as suggested by tyeth and I'll report back here.

Post Reply
Forum rules
If you're posting code, please make sure your code does not include your Adafruit IO Active Key or WiFi network credentials.

Return to “Internet of Things: Adafruit IO and Wippersnapper”