• Human Infrastructure
  • Posts
  • Human Infrastructure 414: TCP Debugs, Fixing BGP Security Problems, and More

Human Infrastructure 414: TCP Debugs, Fixing BGP Security Problems, and More

THIS WEEK’S MUST-READ BLOGS 🤓

There are so many good quotes I could pull from this post, I was at risk of copy-pasting the whole thing. The author looks the money being spent on AI data centers and does some back-of-the-envelope calculations about the revenue required to get a return on that spending and to cover depreciation. The result? 

There just isn’t enough revenue and there never can be enough revenue. The world just doesn’t have the ability to pay for this much AI. It isn’t about making the product better or charging more for the product. There just isn’t enough revenue to cover the current capex spend.”

This is not an anti-AI screed. The author says he uses AI tools himself and likes them, and anticipates that AI as a technology will survive and thrive. What’s coming is a serious correction for investors and corporations shoveling billions and billions into this AI buildout. - Drew

This is an excellent troubleshooting/bug hunt saga that also inspired the author to create an open-source tool called go-tcpinfo, an observability tool for TCP. It’s a fun origin story (and I bet a lot of open source projects have similar origins). - Drew

This 6 part series offers the following entries.

As I skimmed the series, I found the articles written in plain, practical English and paired with simple, whimsical diagrams. - Ethan

Bruce Davie surveys the technologies in play to help secure BGP, and reviews how well deployments are going. Or not going. Bruce touches on…

  • Route Origin Validation (ROV)

  • Resource Public Key Infrastructure (RPKI)

  • BGPsec

  • AS Provider Authorization (ASPA)

This piece was the result of research Bruce is doing for the next book in the Systems Approach series he’s been co-authoring with Larry Peterson for many years now. Worth a read, as Bruce summarizes each of these technologies and deployment challenges succinctly. - Ethan

MORE BLOGS

SuzieQ: Deep Up-to-Date Actionable Insights About Your Network

SuzieQ, from Stardust Systems, is a high performance, agentless, multi-vendor application to help you make sense of your network. Check if the MTU is consistent across your entire network, or if STP is configured correctly, or where an endpoint is, validate that an OS upgrade went as expected, and so much more. All without writing a single line of code. Because SuzieQ can gather data as frequently as a minute, you’re always working with up-to-date information, while transforming your work, whether it be automation, troubleshooting, validation and so much more. 

One user said that before SuzieQ, none of the dozens of tools they had could answer really fundamental questions about their network. 

Schedule a demo, come see why Gartner recognized us as a cool vendor, and how we can empower and de-stress every member of your network infrastructure team while confidently providing a solid foundation for your business to thrive.

TECH NEWS 📣

Head explode emoji - Drew

The headline here is a bit disingenuous. You might infer that there’s an Orwellian pipeline directly between the chatbot and the cops. What’s really happening is that OpenAI has shared some details about its new policies for dealing with users who express thoughts of self-harm or harm to others to the chatbot.

In the case of harm to others, if the bot detects users who might be planning to hurt people, that conversation gets sent to human reviewers. If the reviewers decide there’s a potential for imminent harm, they can contact law enforcement.

I’m conflicted about this. On the one hand, it makes sense that if ChatGPT is conversing with a user who sounds like they’re about to go on a shooting spree, maybe it’s good for a few people to take a look and decide whether to call the police.

On the other hand,it raises lots of questions. Who are the people assessing these chats? Do they have any qualifications in psychology or counseling or risk assessment? Is this for English-only chats? Will this work get farmed out to poorly trained, poorly compensated contractors? The only thing the blog says is that the reviewers will be “a small team trained on our usage policies.” 

You can imagine this policy going badly in all sorts of ways. What if a user is just goofing around with the chatbot to see what happens, and then armed police show up and someone actually gets hurt or killed? What if this becomes a new vehicle for swatting? And what if state or federal entities decide “Hey, your chatbot seems like it might be good at crime prediction; let’s build an Orwellian pipeline to all its conversations.”

However this plays out, it’s clear that AI companies are running massive, real-time experiments on the human psyche. We did this already with social media and a lot of the results—the erosion of a shared reality; the shredding of truth; the lightning-fast spread of misinformation, propaganda, and hate; the ability to connect, organize, and activate people with reprehensible ideas—haven’t been great. Are Sam Altman and his ilk going to do any better? - Drew

In some non-AI-related news (or maybe still related depending on where these wafers end up), chip giant TSMC will reportedly begin volume production of 2-nanometer chips in the second half of this year. TechSoda reports that orders are already coming in, even at the record-high price of $30,000 per wafer. TechSoda also reports that TSMC is getting rid of Chinese-made equipment for its 2-nm fabs to avoid being penalized by proposed legislation in the US that would bar any company receiving US subsidies or tax incentives (which TSMC is using to help build plants in the US) from purchasing tools or equipment from countries of concern (i.e. China). - Drew 

The core problem driving co-packaged optics (CPO) is signal loss paired with the power required and heat generated to compensate for that loss when using pluggable modules.

“Data signals in such designs leave the ASIC, travel across the board and connectors, and only then are converted to light. That method produces severe electrical loss, up to roughly 22 decibels on 200 Gb/s channels, which requires compensation that uses complex processing and increases per-port power consumption to 30W (which in turn calls for additional cooling and creates a point of potential failure), which becomes almost unbearable as the scale of AI deployments grow, according to Nvidia.”

CPOs help enormously. Plenty more detail as well as some diagrams if you click through. - Ethan

MORE NEWS

FOR THE LULZ 🤣

RESEARCH & RESOURCES 📒

From the abstract: “Here we lay out the position that small language models (SLMs) are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems, and are therefore the future of agentic AI. Our argumentation is grounded in the current level of capabilities exhibited by SLMs, the common architectures of agentic systems, and the economy of LM deployment. We further argue that in situations where general-purpose conversational abilities are essential, heterogeneous agentic systems (i.e., agents invoking multiple different models) are the natural choice. We discuss the potential barriers for the adoption of SLMs in agentic systems and outline a general LLM-to-SLM agent conversion algorithm. “ - Drew

From the home page. “Modern terminal HTTP/TCP latency monitoring tool with real-time visualization. Think httping meets modern CLI design with rich terminal UI, phase timing, and advanced analytics.

Status: Feature-complete MVP with HTTP/TCP support, phase timing, outlier detection, and comprehensive monitoring capabilities.” - Ethan

MORE RESOURCES

  1. AI Is Slowing Down (repo of stories documenting AI’s slowing progress) - Peter Gostev

INDUSTRY BLOGS & VENDOR ANNOUNCEMENTS 💬 

This post aims to help network engineers using Python to strike a balance between “tooling perfection” and what actually fits a day-to-day workflow. The post draws from multiple automation engineers’ input about what they “are actually doing to manage Python environments, dependencies, and tooling in their network automation projects.” The post covers practical solutions, the tool dependency problem, use case recommendations, and more. - Drew

Google says many of its Gemini AI models are now available on Google Distributed Cloud (GDC), which is air-gapped on-prem infrastructure and software managed by Google. Options include GDC infrastructure with Nvidia Hopper and Blackwell GPUs. GDC is designed to support security or sovereign data requirements that would prevent workloads from being run in public clouds. - Drew

EnGenius has announced an affordable Wi-Fi 7 AP. How affordable? EnGenius says the MSRP is $129 per AP. The company says the AP can support up to 400 connected devices simultaneously, and covers up to 1,000 square feet. It supports aggregate speeds up of 5Gbps, and is built on the Qualcomm Networking Pro 1220 chipset. - Drew

System Initiative is an automation company co-founded by Adam Jacobs of Chef fame. System Initiative is about automation driven by AI. Adam sets up his post with an example prompt of, “We need to update the load balancers to be more aggressive with their health checks.”

Adam then explains what System Initiative does from there. “Our custom AI Agent is responsible for taking your prompt, feeding it to the underlying generative AI model, and then using its knowledge of how AWS works, combined with expert knowledge of System Initiative, to make a plan for what needs to be done. It starts by exploring your System Initiative Workspace, searching for relevant infrastructure - like Load Balancers, Listeners, and Target Groups. If you don’t have any, it will dynamically discover them from your AWS account. Then it will analyze the load balancer configuration, determine a more aggressive health check, and propose it to you. You’ll review it to make sure you like the changes, then apply it.”

I like where this is going. I think a key differentiator is found in the phrase combined with expert knowledge. I interpret that to mean that System Initiative isn’t just training an AI model on data and hoping it does the right thing most of the time. Rather, expert knowledge is being used as a guardrail that should generate a predictable result. That’s my theory, anyway. I’m keen to hear what you think of System Initiative if you take it for a spin. - Ethan

MORE INDUSTRY NOISES

DYSTOPIA IRL 🐙

TOO MANY LINKS WOULD NEVER BE ENOUGH 🐳

LAST LAUGH 😆