net-snmp meltdown on high-density hosts

logo-5.2

When running a host with very high number of network interfaces and/or IP addresses, the default implementation of net-snmp starts to break down on multiple levels causing all sorts of nasty stuff such as 100% cpu and not responding to your queries.

I first started noticing this problem when my custom router was running around 500 network interfaces. The first symptom was relative high cpu usage for the snmpd process. Understandably, it was dismissed as a general side-effect of the scale and the number of queries on the host. However by the time we reached 1000, it started to become a bit worrying.

Looking deeper into it, it seems that the same issue will happen with any large number of objects. For example, it was reported here with high IP address count. And here with high number of routes. In both cases, the solution was rather drastic: hard-code bypasses in the former, completely disable var_route module in the latter.

As the number kept on increasing and approached the 3000, cpu was constantly at 100% (one core 100%) and finally snmpd completely meltdown and stopped responding to queries.

That’s when I decided to dedicate some time to look into this. So I forked the main net-snmp repo here to see what can be done.

In the following sections, I will be mainly focusing on large number of network interfaces since this is my use-case.

Reproducing the condition

Let’s start off by a simple repro setup. Let’s create 3000 tap interfaces:

for((i=1;i<=3000;i+=1)); do ip tuntap add dev tap$i mode tap; done

Then start snmp and watch it spiral into 100% cpu and completely stops responding to queries. In fact, you’ll probably need to kill -9 it.

How does net-snmp load interfaces?

After looking into the code and adding a bunch of perf counters, turns out that the problem is multiple folds. But first, here is how net-snmp handles network interfaces:

  1. Start a timer every IFTABLE_CACHE_TIMEOUT, 3 seconds, to refresh network interfaces state.
  2. When timer fires, go load entire network interfaces information in a new container.
  3. Diff the new container to what we already have to find deltas.
  4. Repeat.

At first look, polling is obviously problematic. But still, we are talking about 3000 entries on a modern cpu.

I’ve measured the time it takes to do a single refresh: ~16 seconds!

The meltdown

Here is what causes the daemon to meltdown:

  • Every time a refresh is performed, it reloads all interface information (polling) from scratch.
  • Since the refresh timer is set to 3 seconds. This means by the time a refresh is done, another one immediately follows in a continuous tight loop.
  • While performing refresh(es), the daemon does not respond to queries.

While increasing IFTABLE_CACHE_TIMEOUT would slightly mitigate the problem, it is far from a solution. First, you lose freshness of readings on the agent. Second, during the 16 second refresh time, the daemon would still be unresponsive.

CPU heat map

Looking deeper in the code, there seems to be several hot areas in the function that loads all interface information:

/*
*
* @retval 0 success
* @retval -1 no container specified
* @retval -2 could not open /proc/net/dev
* @retval -3 could not create entry (probably malloc)
*/
int
netsnmp_arch_interface_container_load(netsnmp_container* container,
u_int load_flags)

The method uses proc filesystem to obtain a list of all network interfaces, then loops on them obtaining multiple pieces of information. Here are the hottest code paths:

  • Obtaining ipv4/ipv6 information:
        /*
         * set address type flags.
         * the only way I know of to check an interface for
         * ip version is to look for ip addresses. If anyone
         * knows a better way, put it here!
         */
#ifdef NETSNMP_ENABLE_IPV6
        _arch_interface_has_ipv6(if_index, &flags, addr_container);
#endif
        netsnmp_access_interface_ioctl_has_ipv4(fd, ifstart, 0, &flags);
  • Inserting interface entires in the container:
        /*
         * add to container
         */
        CONTAINER_INSERT(container, entry);

Here is how the 16 seconds breaks down:
heatmap.png

interface_ioctl_has_ipv4

Here is the function signature:

/**
 * check an interface for ipv4 addresses
 *
 * @param sd      : open socket descriptor
 * @param if_name : optional name. takes precedent over if_index.
 * @param if_index: optional if index. only used if no if_name specified
 * @param flags   :
 *
 * @retval < 0 : error
 * @retval   0 : no ip v4 addresses
 * @retval   1 : 1 or more ip v4 addresses
 */
int
netsnmp_access_interface_ioctl_has_ipv4(int sd, const char *if_name,
                                        int if_index, u_int *flags)

The implementation starts by getting all interfaces and their struct ifconf. Then loops on all interfaces looking for if_name. Once found, set NETSNMP_INTERFACE_FLAGS_HAS_IPV4 appropriately.

Obviously calling this method 3000 times with 3000 interfaces is not very efficient.

CONTAINER_INSERT

This calls into a net-snmp common container implementation using binary_array which is an implementation of an array with one or more indexes. Unfortunately the insert implementation is very inefficient when the array size grows as it tries to resort the entire array for efficiency causing an exponential growth at close to O(N^2)​.

By default, the binary_array is created without the flag CONTAINER_KEY_ALLOW_DUPLICATES. This means that with each insert, the entire array is resorted to check for duplication.

Mitigations/Workarounds

There are a few things we can do to mitigate the problem:

  • Set CONTAINER_KEY_ALLOW_DUPLICATES

For some reason CONTAINER_KEY_ALLOW_DUPLICATES is not set for the interfaces container. From the code, it does not seem to be possible to duplicate keys while loading interfaces from scratch.

For example:

    /* set allow duplicates this makes insert O(1) */
netsnmp_binary_array_options_set(if_ctx->container, 1,
CONTAINER_KEY_ALLOW_DUPLICATES);
  • Implement better way to netsnmp_access_interface_ioctl_has_ipv4

It is actually in the code comment. This function is a nuke for what it is used for.

Possible fix(es)

Best way to resolve these performance issues is to two folds:

  • Stop using poll and switch to netlink kernel interface

Polling is bad. One way to increase efficiency is to have a background thread that listens to kernel netlink events to continuously maintain a full state of interfaces and routes. This way, when needed, information will be readily available. All that is needed is to update the stats.

  • Implement a more efficient data structure that can scale to large number of entries

I haven’t looked into this one yet. But I’m thinking of a modern implementation of hashmaps and array lists is possible.

I’m hoping that I’ll have time to work on this. You can follow my GitHub project for latest updates.

 

Leave a comment