• Human Infrastructure
  • Posts
  • Human Infrastructure 385: Capturing System Calls, Hitting Ceilings, Unpacking 802.1, and More

Human Infrastructure 385: Capturing System Calls, Hitting Ceilings, Unpacking 802.1, and More

THIS WEEK’S MUST-READ BLOGS 🤓

Ivan Pepelnjak never shys away from complexity, while also bringing clarity to the complex. - Drew

Mike is writing a series on building an EVPN overlay using a “hands on keyboard” approach. This post describes the requirements and walks through all the things that need to be configured in the underlay. He’s also created a GitHub repo where you can follow along with this project. - Drew

Forrest Brazeal says there’s a ten-year ceiling on a software engineer’s career. Not that you can’t be a software developer for more than 10 years, but once you hit that milestone, you’re also going to run into salary caps, career limitations, and lots of hungry junior devs willing to do your job for less money while also learning all the hot new languages. 

You should read the post to get his full argument, but I’m wondering if this ceiling also applies to network engineering? There’s a lot of murmuring in the field that there aren’t enough junior network engineers, hungry or otherwise, in the pipeline. Curious to know if readers think that Forrest’s view of software developers applies to network engineering. You can share thoughts here or here. - Drew

This is a good overview of the issues organizations face when trying to connect on-prem resources and public clouds, and connect applications that run in different public clouds. It also has a handy table that matches useful features to the services offered by AWS, Azure, and GCP. - Drew 

Pouriya Jamshidi offers the following tutorial.

“In this article, we will learn how to:

  • Generate random syslog messages using the logger utility

  • Modify packet captures using Tcprewrite

  • Replay packets (pcap/pcapng files) in a controlled way and locally using Tcpreplaywith the help of tc and network namespaces on Linux machines

These will be the main takeaways from this article. Although there are some more details, which you can explore further.”

Why? Pouriya was developing a syslog proxy, and needed a way to deliver test syslog messages. So ran a packet capture to get a bunch of syslog messages, then used the techniques in the article to modify and replay them. Cool! - Ethan

Lcamtuf digs into a claim that an Ethernet-to-USB dongle was an evil Chinese spying tool. Seems not. Although it’s not paranoia if they really are out to get you, sometimes perceived evil isn’t actually evil. - Ethan

MORE BLOGS

The SRE Report 2025 has been released! Download your copy today
Now in its 7th year, the SRE Report 2025 provides an in-depth look at the latest trends, insights, and data shaping reliability and resilience practices. Whether you're a site reliability engineer or simply interested in the field, this comprehensive report offers valuable takeaways for everyone.

Highlights include:

· Toil levels have risen for the first time in 5 years.

· 86% of organizations use 2-10 monitoring or observability tools.

· 53% agree that “Slow is the new down”—bad performance can be as damaging as outages.
Download your copy of the SRE Report today (no registration required).

TECH NEWS 📣

PowerSchool, a cloud service widely used by K-12 schools in the United States around the world, has suffered a massive data breach. Affected data includes names, birth dates, Social Security numbers, and medical information about students and teachers. Ars Technica describes PowerSchool thusly: “Besides providing software for administration, grades, and other functions, PowerSchool stores personal data for students and teachers, with much of that data including Social Security numbers, medical information, and home addresses.” The article says more than 16,000 schools use PowerSchool, and that tens of millions of students and teachers could be affected. PowerSchool is offering two years of free credit monitoring and identity protection services to anyone affected. But no one wants that. We want these companies to do a better job of protecting sensitive data! - Drew

The latest version of the open-source OpenZFS, offering expanded RAID and speedier deduplication capabilities. The Register says the latest version will be in Linux distros including Ubuntu and Proxmox, among others. - Drew 

Got Juniper VPN gear or edge devices? You might want to look into this. - Drew 

Modern CPUs are stuck at about a 5GHz clock speed due to the “breakdown of Dennard scaling” (the power density problem) and the “so-called von Neumann bottleneck" (the speed limit moving data between memory and CPU). But what if you move to an all-optical circuit design? You can crank up the clock speed substantially. There’s also an arXiv PDF on this work. - Ethan

The law cited is the Communications Assistance for Law Enforcement Act (CALEA), which codifies that telcos will grant court-approved access to government agencies to facilitate crime investigation, but that the bad guys must also be kept out. In the midst of the Salt Typhoon brouhaha, CALEA is being looked at as Salt Typhoon used the investigative apparatus created by CALEA to steal the obscene amount of information they stole for the People’s Republic of China. According to The Register’s investigation, it appears that Salt Typhoon used the CALEA backdoors to get at not only telco datasets, but also federal government datasets including FBI data. This story just keeps getting uglier. - Ethan

Remember how P4 was open sourced last week? Here’s a chunky thread about it on Hacker News. A whole range of reactions. - Ethan

MORE NEWS

FOR THE LULZ 🤣

RESEARCH & RESOURCES 📒

Stratoshark - Wireshark Foundation 
https://stratoshark.org/ 

The Wireshark Foundation has released a new tool, called Stratoshark, that captures system calls on Linux devices. Just as you use Wireshark to capture packets to diagnose problems or investigate security issues, Stratoshark brings the same capabilities to individual Linux machines. Stratoshark is aimed at IT pros who need to understand how an application is behaving on a device, or to investigate a potential security incident, and system call capture and analysis are a great way to do this. Stratoshark is open source, and the link above will take you to the software as well as learning resources. - Drew

IP analysis aka reverse lookup service with an API. 7 day trial. Plans at $5, $20, and $100 per month. Site claims to give you all the domain names, subdomains, geo location, and other metadata attached to an IP address. More info on their help page.

Have you tried this? Did you like/not like it? Let me know. - Ethan

Vincent’s public projects include the akvorado flow tool, an LLDP daemon, and the snimpy Python interactive SNMP query builder tool. - Ethan

MORE RESOURCES

  1. ServiceRadar (distributed network monitoring solution with tiny footprint) - mfreeman451 via GitHub

  2. HRUI (and Horaco, Sodola, XikeStor, AmpCom) Switch Terraform Provider - brennoo via GitHub

  3. Kotaemon (RAG UI for chatting with docs) - Cinnamon via GitHub

  4. PyViz (the Python visualization landscape) - PyViz.org

  5. Linux Network Programming Tutorial - nguyenchiemminhvu via GitHub

INDUSTRY BLOGS & VENDOR ANNOUNCEMENTS 💬 

Drut is building the data center infrastructure of the future. Aimed at HPC & AI computing, Drut is offering a composable fabric of hardware sliced by their software and interconnected dynamically via a photonic fabric. Imagine having a bunch of GPUs in a host you can allocate not only to workloads on the physical host the GPUs are in, but to any host that’s connected to Drut’s DX Fabric.

If you fully adopt the Drut model (you don’t have to…you can use your Ethernet or Infiniband backend network and still use their software), you don’t use Ethernet or IB to interconnect GPUs and hosts. Instead, you skip the network middle layer and use a layer 1 mesh of fiber optics to dynamically connect the components directly to each other via an optical switch (think fancy patch panel and not an external PCIe switch)—all governed by Drut’s software. You get a massive, fast, contentionless PCIe network that allows data center operators to compose whatever host & GPU combo a workload requires on the fly. No more GPUs sitting idle because they are imprisoned in a chassis.

Drut’s announcement of the 2500 series is actually describing two products—the Photonic Resource Unit (PRU) 2500 and the Fabric Interface Card (FIC) 2500. The PRU 2500 is a server chassis you can load with components you want to share & interconnect with across the fabric—GPUs, FICs, etc. The FIC 2500 is the show stealer, though. This monster is a full-height, half-length Gen 5 PCIe board with either 2 or 4 8×100 co-packaged optics (CPO) modules, meaning you’re getting either 16×100G or 32×100G fabric ports per card.

Don’t miss the co-packaged optics part of that. Drut believes the FIC 2500 is the very first commercial realization of CPO tech. You’ll be seeing more CPO coming to networking as it’s a key to continue scaling up speeds while keeping power demands under control. But for today, Drut’s got CPOs up, working, and ready for production.

It’s not often you see something genuinely different in the compute or networking space. Drut is different in interesting ways that the network designer part of my brain really likes. For certain compute scenarios, offloading critical, time-sensitive flows from the backend network onto a composable fabric is strongly appealing. - Ethan

Railway is a hosting offering I’m not yet familiar with. It looks feature-rich and aimed at folks who want to get their code deployed with the infrastructure out of their way. In this post, they talk through their methodology for getting off of cloud to host their offering, and onto their own metal. Not a bad piece to read if you’re thinking about repatriating. - Ethan

EnGenius is rolling out a new VPN router aimed at small businesses. The ESG320 is a cloud-managed router with two WAN ports that can be used simultaneously via load-balancing features to distribute traffic across both links at the same time. It’s also got a stateful firewall and a TMP chip, and can be managed via the cloud. - Drew

Cisco has announced AI Defense, a new set of capabilities that aim to do a lot of things, including help organizations keep track of AI applications being used by employees or developed internally, validate models for safety and security issues, and provide safeguards to ensure that sensitive data isn’t leaked into public AI tools or models. Cisco says AI Defense will be built into Security Cloud (which itself uses AI) and will use “Cisco’s extensive mesh of enforcement points to perform AI security at the network level…”  Other than that, there’s not a lot of detail in the press release or blogs that Cisco released. AI Defense is expected to be available in March 2025, so more details are likely to be forthcoming. - Drew

MORE INDUSTRY NOISES

DYSTOPIA IRL 🐙

TOO MANY LINKS WOULD NEVER BE ENOUGH 🐳

LAST LAUGH 😆