• Human Infrastructure
  • Posts
  • Human Infrastructure 364: EIGRP Fundamentals, A History of IaC, Temperature Thresholds, and More

Human Infrastructure 364: EIGRP Fundamentals, A History of IaC, Temperature Thresholds, and More

THIS WEEK’S MUST-READ BLOGS 🤓

Looking for a crash course on EIGRP fundamentals? This is it. Perfect for a review of how EIGRP computes best path if you’ve already learned the protocol, and useful to share with someone learning EIGRP for the first time. - Ethan

In this blog post, you get a behind-the-scenes look at an OpenBSD dev adding DHCPv6-PD functionality. What’s PD? It’s IPv6 DHCP prefix delegation. RFC 8415 Dynamic Host Configuration Protocol for IPv6 (DHCPv6) describes PD in section 6.3.

“The prefix delegation mechanism, originally described in [RFC3633], is another stateful mode of operation and was originally intended for simple delegation of prefixes from a delegating router (DHCP server) to requesting routers (DHCP clients). It is appropriate for situations in which the delegating router (1) does not have knowledge about the topology of the networks to which the requesting router is attached and (2) does not require other information aside from the identity of the requesting router to choose a prefix for delegation.”

The post walks through the programming challenges involved in implementing PD and how they were solved. - Ethan

Remotely access your lab using Cloudflare Tunnels, even if you’re behind CGNAT. Nadeem Siddique shows you how. - Ethan

Adam Ruka walks through the last many years of IaC, ending with the next evolution he believes is coming—Infrastructure FROM code. In this model, infrastructure requirements are derived from the application code. He cites the TypeScript library Eventual and general-purpose programming language Wing as examples of this new approach to infrastructure instantiation. I gotta say, looking at his simple code examples…that’s a LOT of abstraction. What could possibly go wrong? On the other hand, maybe it’s the infrastructure abstraction we’ve always wanted. On yet another hand, maybe it’s the infrastructure abstraction we’ve always feared. - Ethan

Bryan Ward goes deep on finding the best way to measure temperature thresholds for telecom gear–specifically Juniper EX and QFX equipment. The post was spurred by a series of alarms warning Bryan that equipment was exceeding temperature thresholds he’d set. 

Bryan writes “The shame is entirely on me. I had been adding some rules to our production monitoring system yesterday, and configured the temperature thresholds based on the values published in Juniper’s datasheets (40 and 45°C, depending on the model). The problem is that the sensors inside the equipment do not measure ambient temperature – they’re located in the chassis in places that, even during normal operation in a properly cooled environment, will regularly exceed those limits.” He then digs into how to determine the most appropriate thresholds for sensors and equipment to avoid false alarms. - Drew

AutoCon2 is coming up fast on November 18-22 in Denver, Colorado and we want to let you know some key dates:

Conference Registration is open NOW!

  • You can get super early bird pricing of only $299 until August 28

  • Hotel registration is open now - grab a room SOON!

Workshop Registration Is Open!

  • We're going to have a great slate of workshop options covering a range of topics in network automation and orchestration

  • Note that it's a separate event conveniently preceding AC2

The Full AC2 Conference Agenda will be published by September 9

NAF is a watering hole - a place where we can have harmonious collaboration in network automation: the practice of network automation, orchestration, observability, AI tooling, education, process and standards, and more. Come hear what your peers are doing in their networks (on the stage and in the hallways), what solution providers are bringing to the table, what's happening with open source, and all things network automation.AutoCon is THE Forum for Network Automation. See you in Denver!

TECH NEWS 📣

After the outage the world won’t forget quickly (or will it?), CrowdStrike stock took a hit. CRWD has fallen from about $392 a share on 1-July-2024 to $235 on 8-August-2024 as I write this. And that’s a recovering price. It’s been as low as $195 in the last few days. What’s an investor to do? Sue. Don’t assume that it’s just Wall Street fat cats behind the suit. “The class action lawsuit submitted by the Plymouth County Retirement Association in the U.S. District Court of Austin, Texas, seeks compensatory damages for these losses.”

Right. A retirement association invested in CRWD got caught in the fallout of the CrowdStrike/Microsoft debacle. On what ground is the PCRA suing? “The class action alleges that stockholders were defrauded by CrowdStrike's knowingly false statements about the quality of its products and procedures.”

It will be interesting to see how this suit pans out. Holding a software maker legally liable for the quality of its products (not exactly the basis of the suit, but an intrinsic aspect) has not been especially common. Most of us in IT assume code quality is hot garbage, and have learned to be late adopters when we have a choice. A ruling in the PCRA’s favor could have repercussions for the practice of software development and deployment, and even IT infrastructure changes that result in an outage that impacts the general public. - Ethan

Conservatives tend to fight on reflex against government interference in business practices. In this case, the business practice is that of ISPs being able to charge different rates for different tiers of service, or to limit throughput for certain kinds of traffic across their infrastructure. The FCC is against that, attempting to bring net neutrality back into play.

I suspect most of us like the concept of net neutrality in principle, although the practice of it can be onerous. OTT providers such as streaming content services use a disproportionate amount of the pipes. What does neutrality even mean when a few elephants are stepping all over the mice? The solution, as always, is going to be found in common sense compromise, although the FCC and perhaps eventually the US Supreme Court will have to slug it out about net neutrality first. - Ethan

Storage specification NVMe has been updated to version 2.1. Of potential interest to networks, the 2.1 spec includes ”a network boot mechanism for NVMe over Fabrics (NVMe-oF™) and support for NVMe over Fabrics zoning.” - Ethan

FOR THE LULZ 🤣

RESEARCH & RESOURCES 📒

A Canadian government agency has written a detailed report on an outage that affected Rogers Communications, a major telco and ISP, back in 2022. The outage lasted more than 24 hours and disrupted mobile, wireless, wired Internet service for 12 million customers. The cause was configuration errors made on distribution routers during an upgrade to the core network. The report details not only the causes, but also remediation steps that Rogers took to recover, as well as changes the company made to improve reliability and resiliency. The report also offers recommendations from the government agency to Rogers specifically and telcos in general. I’m sharing this link because the past few weeks have taught us that we should all be thinking about reliability and resiliency in our networks, and there may be ideas in here you could use. - Drew

If you’re interested in IoT and perhaps the Long Range Wide Area Network (LoRaWAN) communications protocol, these PDFs are for you. Ungated. Just hit the download links to get them. - Ethan

INDUSTRY BLOGS & VENDOR ANNOUNCEMENTS 💬 

Gluware has released version 5.4 of its network automation platform. New features include a redesigned UI for network discovery, plus a new discovery option for subnet-based searches. The release also now lets you run Ansible playbooks right inside the Gluware platform. It also streamlines sync confguration with ServiceNow’s Configuration Management Database to ensure that your sources of truth are synchronized. - Drew

Imbue shared a detailed post about the infrastructure it built to train an AI model with 70 billion parameters. It’s a fascinating read that covers, in depth, everything from the network fabric (Infiniband) to the server hardware and GPUs, plus all the testing (so much testing!) that had to happen before they even started a job. Some basics: this post describes a cluster of 511 hosts, each with 8 GPUs. As mentioned, they used Infiband as the network fabric for the GPUs, but this fabric also connected to two Ethernet networks (one for sending the training data, and another for configuration and management). If your org is thinking about a DIY AI model training project, you’ll want to read this. - Drew

Azure is rolling out a public preview of the Auxillary Logs plan in its Azure Monitor service. The idea is to provide a single service that lets companies store logs in different tiers for different use cases. For instance, logs required for compliance purposes can go into a cheaper tier, while logs used in troubleshooting or investigation can be stored in a tier that supports more frequent access among concurrent users (at a higher price, naturally). Options include Auxillary Logs, Basic Logs, and Analytics Logs. Regardless of tier, Microsoft says the logs can be stored for up to 12 years and queried and searched. The URL above describes the new options, and has links to documentation and pricing. - Drew

Microsoft has discovered a vulnerability in ESXi hypervisors that are being targeted by ransomware gangs. The vulnerability lets attackers get full admin permissions on domain-joined ESXi hypervisors. Using these permissions, attackers can encrypt file systems and demand a ransom payment to unencrypt.

Broadcom has released a patch to fix this vulnerability. Microsoft says its researchers have seen attackers exploit this vulnerability in the wild, so you may want to get on this if you haven’t patched already. - Drew

TOO MANY LINKS WOULD NEVER BE ENOUGH 🐳

LAST LAUGH 😆