Human Infrastructure
Posts
Human Infrastructure 395: Dave Täht's Legacy, DevOps Was A Bad Idea, Model Context Protocol Starter Kit

Human Infrastructure 395: Dave Täht's Legacy, DevOps Was A Bad Idea, Model Context Protocol Starter Kit

Ethan Banks
April 03, 2025

THIS WEEK’S MUST-READ BLOGS 🤓

In loving memory of Dave Täht - LibreQoS
https://libreqos.io/2025/04/01/in-loving-memory-of-dave/

I am incredibly sad to report that Dave Täht has died. Dave’s contributions to networking improved the Internet experience for millions of people. Please read the article for an excellent tribute to Dave’s work around bufferbloat, latency, and quality of experience.

Dave was a guest on the Heavy Networking podcast back in February 2023, talking about the LibreQoS project.

Here’s a decently-sized thread on Hacker News celebrating Dave’s work.

And here’s a GitHub pull request collecting comments about Dave if you care to contribute.

We’ll miss you, Dave. Your absence is felt. Thank you for everything. May it be that someone else picks up the guitar you’ve put down. - Ethan

The Complexity of High-Speed Ethernet Auto-Negotiation (AN) and Link Training (LT) (2024) - Ethernet Alliance
https://ethernetalliance.org/blog/2024/07/15/the-complexity-of-high-speed-ethernet-auto-negotiation-an-and-link-training-lt/

A friend of mine told me about Ethernet link training over electrical cable, something I was not previously aware of. Nor was I aware that auto-negotiation gets tougher as the link speeds climb over electrical cable.

“Transmitting 100Gbps over even a few meters of electrical cable presents significant signal integrity challenges. Therefore, advanced link equalization and Forward Error Correction (FEC) are critical elements to ensure the appropriate signal quality.

Auto-Negotiation (AN) and Link Training (LT) are two of the essential processes required to establish the characteristics of the link partners including supported link speed(s), FEC enablement, and tuning of transmitter (Tx) equalizers. The purpose of these processes is to ensure the signal integrity of a link is adequate before real payload traffic is transmitted at full line speed.

Whereas AN works at a low link speed rate, LT requires full wire speed to optimize the transmissions. The two link partners communicate via the LT protocol to tune their Tx equalizers to achieve the best possible Bit Error Rate (BER) within the specified time frame. At 100Gbps data rates, FEC will improve BER, reduce lost or retransmitted packets, and generally optimize the link transactions.”

The problem? Vendor interoperability. Despite 802.3 standards governing AN and LT, the standards aren’t specific enough to guarantee both sides are going to perform AN and LT in the same way. Sigh. - Ethan

A very theoretical scenario, DNS edition - Jan-Piet Mens
https://jpmens.net/2025/03/27/theoretical-scenario-dns-edition/

Jan-Piet writes, “I’ve been asked a few times over the course of the same amount of days, what would happen if the powers that be began deleting top-level domains (TLDs) from the DNS system, and whether there is something we (e.g. Asians, Africans, Europeans, Canadians, South Americans, Australians, etc.) could do about it.

A simple question with an incredible boatload of ramifications, but I’ll try to answer the question from a solely technical point of view.”

What follows is well worth your time if you’d like to know more about the DNS CLI utility dig and how to correctly understand its output to answer the article scenario. - Ethan

In retrospect, DevOps was a bad idea. - Rethinking Software
https://rethinkingsoftware.substack.com/p/in-retrospect-devops-was-a-bad-idea

Adam Ard shares, “Before DevOps, developers would write software and hand it off to an operations team, who then had to figure out how to get it running in production. This didn’t work very well. Eventually, developers who cared about deployment started getting involved in making sure their code made a smooth transition to production. That was a huge improvement. And that’s where we should have stopped.”

As the article proceeds, Adam makes a pretty good case for that premise, I must say. - Ethan

MORE BLOGS

If you get the chance, always run more extra network fiber cabling - Chris Siebenmann
Take This On-Call Rotation and Shove It - Scott Smitelli
Things that go wrong with disk IO - Phil Eaton
The Reality of Working in Tech: We're Not Hired to Write Code (2023) - Ibrahim Diallo
The case against local LLMs - Vivek Haldar
A load of old… (the post-quantum cryptography scare might be crap) - APNIC
Talking To Your Mailserver Is Not as Hard as You Think! (2024, IMAP discussion) - Michi’s Blog
Spammers are more consistent at making SPF, DKIM, and DMARC correct than are legitimate senders - @grumpybozo (Bill Cole of SpamAssassin) via toad.social on Mastodon

As IT infrastructures grow more complex, visibility and resilience have never been more critical.

Join Catchpoint technical experts, Moiz Khan and Sheldon Pereira, as they explore how to bridge visibility gaps, leverage key network metrics, and enhance monitoring with modern observability techniques. You'll learn how Internet Performance Monitoring (IPM) fills the gaps left by traditional approaches and strategies to strengthen network resilience and performance.

Key takeaways:

Networking Landscape: The shift from traditional monitoring to IPM and its growing importance.
External Visibility: Why monitoring DNS, Cloud, CDNs, and BGP is key for full-stack performance insights.
Proactive Resilience: How IPM detects and resolves issues early, using automation and AI for predictive analysis.
Key Metrics: Essential metrics for network reliability and evolving your monitoring practices.
Collaboration & Integration: How real-time monitoring bridges gaps between DevOps, NetOps, and business teams.
You don’t want to miss this one, register now!

TECH NEWS 📣

Oracle Cloud security SNAFU latest: IT giant accused of pedantry as evidence scrubbed - The Register
https://www.theregister.com/2025/03/31/oracle_reported_breaches/

OCI is struggling with the optics on this one. Lots of criticism from lots of sources while OCI plays the “What do words mean?” game. For example…

“Infosec expert Kevin Beaumont also chided Oracle for trying to duck responsibility for the alleged Oracle Cloud breach, noting the firm appears to be splitting hairs by drawing a distinction between Oracle Cloud and Oracle Cloud Classic.

That is to say, the US super-corp claims Oracle Cloud was not infiltrated, though that leaves the door open to Oracle Cloud Classic being the specific product that was compromised. A distinction without a difference: Part of Oracle's public cloud offering was broken into, according to rose87168 and others.”

No matter. Six million customer data records seem to have been stolen. If my data is among the six million, I don’t care if it was “classic” or not. You got breached, Oracle. Just like everyone else. There’s almost no shame in it at this point. Just tell us what you’re gonna do about it, and we’ll all move on. - Ethan

Nvidia punts silicon photonic switches to keep GPUs fed with data - The Register
https://www.theregister.com/2025/03/18/nvidia_punts_silicon_photonic_switches/

Nvidia “unveiled new Spectrum-X and Quantum-X switches at its GTC shindig in San Jose, California, today, claiming these will enable "AI factories" with millions of GPUs connected on-site, while drastically reducing energy consumption and operational costs for operators.

A key part of this comes from co-packaged optics – the integration of the optical and silicon components onto a single packaged substrate. In network switches, this typically means doing away with pluggable transceiver modules that house the optics and digital signal processor (DSP), and integrating these alongside the switch ASIC instead.

Benefits of CPO are understood to be greatly reduced power consumption, higher bandwidth and lower latency, mainly because of fewer DSPs and the removal of lengthy copper circuitry tracks.” - Ethan

MORE NEWS

Mozilla launching "Thundermail" email service to take on Gmail, Microsoft 365 - TechRadar Pro
Akamai becomes the official distributor of the Linux kernel - Open Source Watch

FOR THE LULZ 🤣

Thx Kaj Niemi for sharing in our community Slack…😊

RESEARCH & RESOURCES 📒

Holo v0.7 Released — What’s New and What’s Next? - Renato Westphal via Medium
https://itnext.io/holo-v0-7-released-whats-new-and-what-s-next-aabfcdd455a1

Renato reports, “Today marks the release of Holo v0.7, an MIT-licensed open-source routing protocol suite written in Rust. This update brings a more mature IS-IS implementation, introducing new features and numerous bug fixes. Significant progress has also been made on the BIER front, and our contributor, Nicolas, was even kind enough to prepare a containerlab topology for testing the Holo control plane with a custom BIER data plane. Additionally, we have several other improvements, including VRRP version 3 support, contributed by Paul.”

Renato goes on to discuss where Holo is headed next. - Ethan

Headscale (FOSS Tailscale control server) - juanfont via GitHub
https://github.com/juanfont/headscale

From the README. “Headscale aims to implement a self-hosted, open source alternative to the Tailscale control server. Headscale's goal is to provide self-hosters and hobbyists with an open-source server they can use for their projects and labs. It implements a narrow scope, a single Tailscale network (tailnet), suitable for a personal use, or a small open-source organisation.” In other words, people like many of us reading this. 👍 - Ethan

textcase (Python text case conversion library) - zobweyt via GitHub
https://github.com/zobweyt/textcase

I have always found text manipulation tools incredibly helpful, either for sanitizing input or formatting output. Lots of features here. From the README.

“Text case conversion: Convert strings between various text cases (e.g., snake_case, kebab-case, camelCase, etc.).

Extensible Design: Easily extend the library with custom cases and boundaries.
Acronym Handling: Properly detects and formats acronyms in strings (as in HTTPRequest).
Non-ASCII Support: Handles non-ASCII characters seamlessly (no inferences on the input language itself is made).
100% Test Coverage: Comprehensive tests ensure reliability and correctness.
Well-Documented: Clean documentation with usage examples for easy understanding.
Performant: Efficient implementation without the use of regular expressions.
Zero Dependencies: The library has no external dependencies, making it lightweight and easy to integrate.” - Ethan

MORE RESOURCES

dish (socket monitoring / health check service) - thevxn via GitHub
Large Language Models are Unreliable for Cyber Threat Intelligence (academic PDF) - Arxiv

SPECIAL RESOURCE SECTION ON MODEL CONTEXT PROTOCOL (MCP)

This week, MCP was in my face on every feed, it seemed like. It’s interesting. It’s got its use cases. It might become an inescapable part of our looming agentic AI future. But I also get the sense that MCP is not fully baked. Anyway, here’s a range of material I collected with a variety of perspectives in case you’re curious or need to give the business people in your life an overview. - Ethan

How Model Context Protocol works. MCP Explained - Quickchat AI
MCP: What It Is and Why It Matters - Addy Osmani via Substack
Model Context Protocol (MCP) - Welcome to the Future of AI Automation - John Capobianco via YouTube
MCP: Flash in the Pan or Future Standard? (debate format) - LangChain
Notes On MCP (skeptical POV) - Tao of Mac
Why MCP Is Mostly Bulls—t (even skeptical-er) - Lycee AI
Connect your AI to any app with Zapier MCP (what could possibly go wrong?) - Zapier Beta

Get AI-Native Security that moves at the speed of your business

Don’t let network protection slow you or your business down.

Juniper’s new SRX4700 Firewall is designed to protect data in motion, bringing security and networking together in a single, streamlined platform.

That means you can mitigate risk without interruption. Making sure users aren’t just well protected, but enjoying the very best network experience.

And with the Mist AI Predictive Prevention Feature, it’s never been easier to keep the network safe from potential, initial, and subsequent attacks.

That means you, future cybersecurity hero, can detect and stamp out bad actors and sophisticated threats before they wreak havoc on the business.

With Juniper, you’re always one step ahead.

Explore Our Solutions Now

INDUSTRY BLOGS & VENDOR ANNOUNCEMENTS 💬

Share your machines with other users - Tailscale Docs
https://tailscale.com/kb/1084/sharing

“You can share access to specific machines with people outside your Tailscale network (known as a tailnet) without exposing them to the public internet. Sharing gives the recipient access to only the shared machine in your tailnet, and nothing else.” - Ethan

Announcing Cloud Pathfinder: Network GPS for Infrastructure Teams - Kentik Blog
https://www.kentik.com/blog/announcing-cloud-pathfinder-network-gps-for-infrastructure-teams/

Network observability vendor Kentik has announced Cloud Pathfinder.

“Kentik’s Cloud Pathfinder is the quickest way to understand the exact route between two cloud endpoints, including all the resources involved in each hop, like gateways and their attachments, and VPCs or VNets. Provided two instances, IPs, elastic network interface, or subnet identifiers, Cloud Pathfinder automatically crawls the path between them, giving turn-by-turn guidance through complex cloud network topologies if a route exists. If there is no route, or if an Access Control List (ACL) or Security Group blocks connectivity, Cloud Pathfinder indicates those rules and their metadata to take the guesswork out of a fix. Whether or not connectivity is possible, Kentik AI instantly provides a human-readable analysis and recommends a step-by-step resolution.”

Feels similar to what CatchPoint and Cisco ThousandEyes can do. Cool tech. - Ethan

From concept to code: AGNTCY’S Internet of Agents is now on GitHub - Cisco Outshift Blog
https://outshift.cisco.com/blog/agntcy-internet-of-agents-is-on-github

Cisco is leaning hard into the AGNTCY ecosystem, building on their earlier announcement from last month. “On March 6, in partnership with LangChain and Galileo, we announced the AGNTCY, and today we’re dropping the initial code ⬇️💻 that you can get your hands on for discovering, composing and deploying multi-agent software (more on that later).”

The blog goes on to detail Cisco’s notion of the “agentic application lifecycle,” summarized with steps of Discover, Compose, Deploy, and Evaluate. Tools and techniques at each step are itemized. Good to give this one a quick read even if you can’t imagine ever using this tech in anger. What I understand from Cisco is that some of their future AI initiatives will leverage this framework. - Ethan

MORE INDUSTRY NOISES

Exploring the effects of jumbo frames (2022) - retinadata blog
Key management in Azure - Microsoft Learn
LLM Limitations: Why Can’t You Query Your Enterprise Knowledge with Just an LLM? - Memgraph

TOO MANY LINKS WOULD NEVER BE ENOUGH 🐳

Zoom bias: The social costs of having a 'tinny' sound during video conferences - Phys.Org
The Great Automatic Grammatizator (Roald Dahl short story, 1954, PDF) - d-a-v-e.org
What High Performers Know About Doing Hard Things - The Caring Techie Newsletter
Why I Maintain a 17 Year Old Thinkpad - Pilled Texts
Yann LeCun, Pioneer of AI, Thinks Today's LLM's Are Nearly Obsolete (beefy article) - Newsweek

LAST LAUGH 😆

Another winner shared by Kaj!