• Human Infrastructure
  • Posts
  • Human Infrastructure 422: Data Centers Are Eating the Planet, Why RFCs Matter, OSPF Router ID Myths, and More

Human Infrastructure 422: Data Centers Are Eating the Planet, Why RFCs Matter, OSPF Router ID Myths, and More

TAKE THE PACKET PUSHERS SALARY SURVEY

We’ve put together a salary survey to understand the current market for network engineering and IT skills. We’ve got more than 280 responses so far. We’d love to cross the 300 mark (or more), so if you haven’t taken the survey yet, we’d appreciate if you take a few minutes. If you have taken the survey, maybe tell a colleague or two about it. We aren’t collecting any personal information, so all responses are anonymous.

After we close the survey, we’ll publish the results on Packet Pushers so everyone can see what we’ve learned. Here’s the link to the survey. Thanks in advance!

THIS WEEK’S MUST-READ BLOGS 🤓

The physical scale of modern AI data center builds is genuinely hard to get your head around. This piece tries to help you get a sense of it, including satellite pictures that show AI data centers after they’ve been stood up on previously empty swaths of land. For example…

From the piece. “OpenAI’s Stargate project in 2024 vs. 2025”

Astonishingly massive AI data centers aren’t just a fad. I think we’re gonna live here, at least until the investor money runs out. - Ethan

Ivan Pepelnjak walks through some history of how various false opinions about OSPF router IDs came to be. There are reasons some people think a router ID is an IP address, for instance. But the RFCs don’t lie. Ivan brings us back to RFC reality by unwinding the crooked path that got us off the proper way. - Ethan

This is a very good introduction to what RFCs are, how they came to be, and why they matter. Author Mohammed also notes the all-important keywords populating RFCs including “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” that are, in my opinion, often to blame for vendor interoperability problems.

This is a great one to read if you’d like a refresher on RFC fundamentals, or have some junior colleagues that could use a primer. - Ethan

Tim draws a clear parallel between the early days of cloud hype and the current AI frenzy. And just as we learned in the cloud days, once those AI bills start coming due and you have to have a solid business reason to justify the expense, organizations will figure out what they should actually be using AI for. 

He writes “Companies won’t waste thousands of tokens writing soulless blog posts, or producing vapid documentation when tokens start costing what they will need to cost to justify the resource expenditure and without VC money subsidizing the entire economy. In the same way companies examine their workloads and ROI vs cost calculations to decide what to run on-prem and what should be run in a cloud, that same level of scrutiny and nuance will descend upon the AI proof of concepts and toys that dominate our LinkedIn feeds.”

I think this is good analysis, and perhaps an opportunity to get a head start on figuring out feasible AI use cases. - Drew

Pat is writing an excellent series on his transition from being a front-line network engineer to managing an engineering team. As part of that manager role, he has to figure out how to guide his team in making sensible use of AI tools. This post has a ton of suggestions for how to incorporate AI into network engineer workflows, things to avoid, and suggested tools. Whether you’re leading a team or just looking for reasonable guidance on how to get the most out of AI for your own job, this is a post worth bookmarking for repeated reference. - Drew 

MORE BLOGS

Your Foundation for Enterprise AI Is Here

Enterprise AI demands more than cloud alone. Equinix Distributed AI delivers the secure, globally consistent infrastructure you need to train where compute is abundant, process sensitive data where it's stored and run inference at the edge, where data is generated.

AI infrastructure without trade-offs: Deploy confidently in liquid-cooled data centers purpose built for high-density AI, delivering power, performance and reliability.

Low-latency edge-to-cloud connectivity: Connect to 225+ cloud on-ramps and thousands of partners via private edge links for seamless data flow.

Predictable costs, neutral ecosystem: Optimize AI spend with transparent pricing and no hidden fees. Use a vendor-neutral ecosystem to avoid lock-in.

Explore how Distributed AI can simplify your architecture, surface hidden efficiencies and ensure a scalable roadmap.

TECH NEWS 📣

This reasonably short piece is an overview of some new terminology you’ll run into as networking for AI workloads continues to evolve. The problem driving all these new groups and terms is that Ethernet is the local area network transport we’ve got (because economics, standards, interoperability and inertia we’ll take to our graves), but not the one AI needs. That is, AI clusters performing training and inferencing work need a lossless fabric with ultra low, predictable latency. That ain’t Ethernet as we know it. Ethernet is a best-effort transport until you start bolting on aftermarket parts. And so it is that we’ve got yet another Ethernet group that’s formed.

“AMD, Arista, ARM, Broadcom, Cisco, HPE Networking, Marvell, Meta, Microsoft, Nvidia, OpenAI and Oracle have joined the new Ethernet for Scale-Up Networking (ESUN) initiative, which promises to advance the networking technology to handle scale-up connectivity across accelerated AI infrastructure. ESUN was formed by the nonprofit Open Compute Project.”

How is ESUN distinct from the Ultra Ethernet Consortium’s efforts? I can’t articulate that yet, but I’m gathering there are different layers at work here. UEC, ESUN, and Scale-Up Ethernet Transport (SUE-T) all seem to have roles to play in the AI data center fabric of the presumably near future.

I just hope we don’t re-invent too many wheels along the way. It feels like we solved some of the same problems back when we were deploying converged Ethernet. We had to make storage traffic feel special lest the storage array collapse in a smoking pile because a frame got dropped. Many protocols were bolted on to help out Ethernet to this end, and some of them (like PFC) indeed show up in the AI data center world. But this new Ethernet for AI looks increasingly foreign to my eyes. - Ethan

Steven Vaughn-Nichols reports that the OpenInfra Summit Europe was not yet another hype-inflating series of talks about AI. The summit’s focus was instead on digital sovereignty. To that end, several European organizations are moving to open source platforms to gain independence from Microsoft, American cloud service providers, and so on.

Although…perhaps independence conveys the wrong idea.

From the article. OpenInfra Foundation general manager Thierry Carrez thinks a better word for what Europe wants is not isolation from the US: "What we're really looking for is resilience. What we want for our countries, for our companies, for ourselves, is resilience. Resilience in the face of unforeseen events in a fast-changing world. Open source," he concluded, "allows us to be sovereign without being isolated." - Ethan

This is a story from my neck of the woods, but I’m sharing it because it’s part of a larger conversation about enormous data center build-outs. Part of that conversation includes whether the electrical capacity actually exists to power those data centers, and the risks of consumer rate payers being stuck with higher electricity prices to cover the costs of capacity increases demanded by data centers. It’s a complicated issue that involves regional grid operators, state and federal regulators, local politicians, consumer advocates, and AI and hyperscale companies.

I don’t know what the right answer is. I do know I don’t want to be stuck with higher prices if a hyperscaler gets a grid operator to spend billions of dollars to build out capacity, and then that hyperscaler never actually builds the data center because, say, the AI bubble bursts. Consumers should not be left holding the bag. - Drew

A US District Court judge has issued a permanent injunction against spyware maker NSO Group that forbids NSO from using its Pegasus spyware to target Meta’s WhatsApp messaging application. The injunction stems from a 2019 lawsuit in which Meta sued NSO Group for allegedly infecting around 1,400 WhatsApp users. 

While this is a victory for Meta, the ruling doesn’t extend to any NSO Group customer  outside the United States, including foreign governments. Therefore, it’s not clear to me if this creates legal wiggle-room for NSO and foreign governments to operate against WhatsApp. The injunction also doesn’t cover any other Meta product, including Instagram, Threads, or Facebook. Presumably those apps are still wide open for NSO Group to exploit. - Drew 

MORE NEWS

FOR THE LULZ 🤣

Shared by Suresh Vina via LinkedIn

RESEARCH & RESOURCES 📒

New functionality for netlab includes using wildcards or regex for group- or ASN/RR members and support for containerized Cisco 8000v.

Lots of improvements to the graphing system as well as other minor improvements and various new features supported on a variety of devices. A few breaking changes for fairly niche situations, and many bug & documentation fixes. - Ethan

From the README. “Ripgrep 15 is a new major version release of ripgrep that mostly has bug fixes, some minor performance improvements and minor new features.

In case you haven't heard of it before, ripgrep is a line-oriented search tool that recursively searches the current directory for a regex pattern. By default, ripgrep will respect gitignore rules and automatically skip hidden files/directories and binary files.”

Ripgrep is also known for being blazing fast. If you’ve got a lot of text to search through, ripgrep should be in your toolbox. - Ethan

This book looks incredibly practical to me. Likely worth a read if you’re serious about building a network automation platform that uses lots of Python.

“Welcome to Talk Python in Production, a hands-on guide for Python developers determined to master real-world deployment and infrastructure management. Have you ever felt locked into pricey cloud services or struggled with overly complex DevOps configurations? This book's stack-native approach offers a refreshing alternative.

You'll learn to containerize Python apps, secure them with NGINX, tap into CDNs for global performance, and manage everything on a single, powerful server, without sacrificing reliability.” - Ethan

MORE RESOURCES

NANOG 95

Join us in Arlington, TX (Oct. 27-29) for NANOG 95 — the premier conference for network operators. Student discounts available. Register now!

I’ll see you there… - Ethan

INDUSTRY BLOGS & VENDOR ANNOUNCEMENTS 💬 

Ansible 2.6 is a “big deal” upgrade from what we understand, and not just a minor release with some bug fixes and few new features. This post is a good place to start researching what you’re up against depending on your current Ansible environment. - Ethan

Cisco is highlighting three capabilities for better operations and troubleshooting of wireless networks. They include Artificial Intelligence Radio Resource Management (AI-RRM) for real-time radio frequency optimization, the ability to launch a packet capture automatically if an issue arises on the WLAN, and the ability to use Security Group Tags tied to centralized policies to enforce microsegmentation and access control. These features are available on a variety of Aironet and Catalyst APs. In particular, the list of APs that support AI RRM are listed here. - Drew 

MORE INDUSTRY NOISES

DYSTOPIA IRL 🐙

TOO MANY LINKS WOULD NEVER BE ENOUGH 🐳

LAST LAUGH 😆

Sweet file ‘o mine

Another gem from Kaj via the Packet Pushers Slack group.