Chasing Network Performance
Every so often there comes a purchase which looking back at we question what we were thinking / if browsing eBay while drunk is a good idea. Sadly I am not immune to these, and in the pursuit of a more stable / more secure network environment I've instead taken myself down a very dark / very expensive rabbit hole that left me with more questions than answers. For the amusement of all here is my cautionary tale...
In a network redesign I moved away from the standard BT Wifi / mesh discs (after many frustrations / poor performance / an inability to grasp how BT could be so bad) to Unifi tech (not for the first time), blanketing the house with WiFi 6E goodness in the hope of a more reliable connection. Speed wasn't the primary focus given even with the multiple internet connections I can barely hit 300 Mbps on a good day, but stability was. Too many times would a WiFi call drop when roaming the house / devices refuse to roam between access points at all.
To help improve the overall performance while also properly segmenting the network into zones I chose to go the virtual firewall route (given the specific layout I wanted), opting for a new system to host everything (while trying to be both power-efficient and quiet). What comes next would be my first mistake...
One drunken search of eBay led me to an Atom-based Supermicro server, specifically the A1SAM-2750F. Featuring an 8-core Atom CPU (of the C2000 series), very low power usage, a compact 1u form-factor, a cheap(ish) price, and near-silent operation (especially with some fan modifications), it seemed like a bargain. Before I knew it the server had been ordered and would be on my doorstep a few days later.
With the server now sat on my desk it was time to check everything over and make sure everything worked. A quick clean of the various components combined with a RAM test and CPU stress test showed everything to be in working order, and things were looking good. Time for mistake number two...
While the system came with 32GB RAM, this wouldn't be enough for the quantity of firewall VM's that I wanted to run. No problem I thought, as this system takes DDR3 ECC memory and I have a large stack of it from earlier Supermicro servers. Unfortunately, Atom CPU's are picky about their memory and aren't compatible with what I have. OK I thought, it should be easy enough to get some compatible DIMMs to get the box to 64GB, wrong... After checking online for the better part of 3 days, the only place I could get suitable memory (according to the Supermicro supported memory list for this system) was via eBay again, specifically a chinese seller that had a single matched set. Not only was this memory very hard to find, it was also more expensive than the server itself.
In a surprise to nobody, another eBay order later and the server now had 64GB of ECC RAM. Now I had the capacity to run the virtual firewalls without artificially limiting them, and from here on it should be easy to get everything operational (with only inter-VLAN firewall policies to worry about right?). Oh boy, time for mistakes numbers three and four...
In testing different hypervisors to see what works / which performs well I encountered my first challenge by virtue of ESXi not installing due to a lack of XSAVE support on the CPU. This caught me (and a lot of people) by surprise, but at the same time isn't that big of a deal and was somewhat expected given the age of the system and the fact that it's a low-power CPU by design (it doesn't have the same level of functionality as a desktop / enterprise CPU). Proxmox to the rescue as that runs on almost anything, and it does. A few minutes later and I had a running hypervisor and was ready to test the first virtual firewall.
The plan with the firewalls was to pass through a physical port to each VM given the system has eight of them (4 onboard, 4 add-on), ensuring good performance and increased stability (given comments regarding virtual NICs from any hypervisor). Alas, this was not viable as while purchasing this server while drunk I hadn't spotted that the C2000 CPU series doesn't support VT-d. In short, no PCIe passthrough for me. I hadn't planned on trying to virtual switch everything, but it seems I had no choice.
A few virtual switches later (still testing two different hypervisors at this point) and I was ready to test a VM to see how things looked (ideally from a network perspective). In comes the next challenge (or headache as I prefer to call them). As a test I created a simple Ubuntu Server VM to see how things looked / how the system behaved. In a word, poor... I have multiple micro/SFF PC's and usually installing Ubuntu Server (minimised) on them (even in a VM) takes less than 5 minutes. Trying it on the new server, a little over 10. Even with it being an Atom CPU I figured it couldn't be that bad, and multiple checks of the different components / verifying the cooling etc left me with the bitter conclusion that Atom CPU's really are as bottom-end as they are advertised.
As a last attempt I figured I would test how they performed after disabling the Meltdown/Spectre mitigations to see how badly they were impacted. In short, it was noticeable but it wasn't going to get the performance back to the levels I had hoped for (and I wasn't going to run with the mitigations disabled anyway). To add insult to injury, despite running only a single test VM at this point the system (under both hypervisors) was showing as having a high CPU ready value. To this day I still don't know why this was showing given there was only a single VM on an 8-core system, so there should be zero contention. No amount of BIOS settings / hypervisor settings seemed to remove this (only to reduce it which likely was coincidental).
Time for some actual firewall VM's and real performance testing (after all, this is what the system was purchased for). A few installs of opnSense later and I had a working network, complete with multiple segregated zones with their own firewall rules. How nice it was to have IOT devices confined to their own tiny island once more. Time to load up iPerf on some external servers and see how things looked.
In a word, awful... Despite trying a mix of different virtual network adapters (again, using two different hypervisors), and tuning just about every setting I could find (even following the latest guidance for opnSense on hardware offload etc) I couldn't get more than 350Mb/s throughout between subnets. CPU load would be pinned at 100% regardless of any optimisations, and even using a singular VM with the maximum number of permitted CPU cores still didn't help.
At this point saying I felt deflated is an understatement, given the amount of money that had been sunk into this (not to mention the hours), and the realisation that the performance was so terrible it would actually be on the threshold of impacting the multiple internet connections I have. Time to dig deep and think outside of the box.
Throughout all of this I had randomly purchased another Dell Optiplex micro system (a bargain at a 2nd-hand store) which also had 64GB RAM but featured a Core i5-9500T CPU. The downside is that the system doesn't have any form of PCIe expansion to make it viable for a network, or does it? Removing the built-in WiFi adapter got me a mini-PCIe slot for a 1Gb Intel NIC (findable on Amazon) while switching to SATA freed up the m.2 PCIe slot. With the latter I managed to find an Intel I210-V NIC that would fit, providing a 2.5Gb port as well. At this point I figured while the Supermicro server might not be viable I could look to using the Dell server instead and cut my losses. It had been a while since my last mistake, I clearly was starting to miss them...
With the parts fitted and some 'questionable' cabling the system was operational and looked good. Both hypervisors installed without issue and both could see the new network adapters (the onboard Realtek was turned off for my own sanity). I hadn't used the I210-V chipset before and as it turned out I was in for some unpleasant surprises. Consistent performance, no. Stability when disconnecting / connecting a cable, no. Ability to frustrate me and waste my time, yes.
To say I was running out of ideas to make all of this work is an understatement. Digging deep (including in the parts bin) reminded me I had a solid Intel X540 NIC that offered 2*10Gb Ethernet ports and had solid performance (I had used it previously). While it wouldn't fit in the Dell, it would fit in the Supermicro and I figured why not give it a go. One replaced add-on card later and I had 20Gb/s bandwidth to my core switch, now to see if the performance was any better than using the previous 8 ports.
In a word, no. As expected, opnSense was still struggling to route traffic fast enough, leaving me with the same ~350Mb/s performance. To add insult to injury (again), the new adapter allowed SR-IOV to be enabled to which I thought there may be an actual solution to the performance issue, only to find that despite stating it could be enabled and going through motions you still can't use it due to the lack of VT-d support. In hindsight I'd have been better purchasing an old 8-core desktop and using that instead.
The one thing that had stuck out to me as odd through all of this was the opnSense throughput. I've used it many times in the past and been able to reach Gigabit speeds without issue, so why the issue with this system? More forum reading, more virtual network adapters to test (again), more tuning options to test (again), and still the same performance. At this point I figured it was time to try a plain Linux VM configured for basic routing / firewalling to see if the performance matched.
In the first (and only) piece of good news I had encountered, routing/firewalling on Linux (with the same network setup) got me over 2Gb/s throughout, both sustained and consistent. While not as high as I would like, this was such a step in the right direction it actually gave me hope for server. Multiple tests later confirmed that Linux was able to act as a firewall multiple times faster than opnSense on this hardware. Not quite approach I was planning to take, but one that was viable / would make the box usable. A new direction for the better :-)
The one thing that didn't make sense (post-Linux) was the throughout between the 10Gb system and the 2.5Gb system. For testing purposes I put them both on the same network and shut everything else down / ensured there was no contention. Try as I might, I couldn't get past around 2.8Gb/s (when testing in bidirectional mode). Testing the m.2 slot with a fast SSD showed it could perform better than that, which unfortunately led me back quirks with the I210-V (especially in m.2 form). Like it or not, this network adapter wasn't going to perform better than that, though given it was destined to be a Docker host that is still plenty of throughput.
So how does all of this play out you may ask... Well, the opnSense firewalls are going to be replaced by Linux VM's with custom firewall logic. I'd hoped to avoid this but given the poor performance and a strong desire to make use of the server this is the only choice. When everything is switched over the performance will be multi-Gigabit and should last for many years to come (at least my wallet hopes so).
As for the lessons in all of this, there are many... Drunk IT purchases are seldom a good idea, and reading the tech specs for systems/boards should be mandatory. Also, beware of 3rd-party add-on boards on Amazon, rarely do they meet expectations.