- Human Infrastructure
- Posts
- Human Infrastructure 362: Outages, Conferences, and a Book Review
Human Infrastructure 362: Outages, Conferences, and a Book Review
THIS WEEK’S MUST-READ BLOGS 🤓
Navigating Infrastructure Outages: Battle Scars and Lessons Learned - Andree’s Musings
http://toonk.io/navigating-infrastructure-outages-battle-scars-and-lessons-learned/
Andree Toonk reflects on the nationwide outage suffered by Canadian ISP Rogers. This isn’t an analysis of that outage, but rather a jumping off point to consider what anyone in infrastructure operations should be doing to minimize misery when all the things break.
If the Rogers outage interests you, there’s an assessment available here. - Ethan
No More Blue Fridays - Brendan Gregg’s Blog
http://www.brendangregg.com/blog//2024-07-22/no-more-blue-fridays.html
Intel Fellow Brendan Gregg leverages the CrowdStrike + MS Windows kernel disaster to explain how eBPF is going to make such a scenario far less likely in the future. - Ethan
The Case For Conferences In 2024 - Good Tech Things
https://newsletter.goodtechthings.com/p/the-case-for-conferences-in-2024
Forrest Brazeal considers live events in the post-pandemic world. Are they worth it for attendees? How do employers with budgets think about these events? How can you tell a good event from a bad one? If you really, really want to go to an event, how can you justify it to the person who’ll authorize and fund the trip?
One of my takeaways from Forrest’s POV is to avoid tech events where the speakers are celebrities…and that’s it. The good events will “bring in people with battle scars who are sharing their failures as well as their successes.” This has been my experience with the NetworkAutomation.forum’s AutoCon events. Hands-on folks sharing what went well and not so well in their network automation…all of which you can watch here. - Ethan
Making Segment Routing user-friendly - Routing Craft
https://routingcraft.net/making-segment-routing-user-friendly/
Dmytro Shypovalov points out that MPLS got complicated to implement over time. State is a problem. Vendor interoperability is another problem. Challenges with RSVP is yet another. He cites more in this well-written and illustrated post. From there, he explains why segment routing should be an improvement over traditional MPLS implementations, but isn’t what it could be because of the way vendors have delivered controllers.
“An SR-TE controller is just a router with some extra functionality like processing BGP-LS and calculating policies with CSPF. It should be even easier than your average MPLS-TE implementation, since there is no need for LSP signaling.
Yet what you see in actual controller implementations is some disgusting bloatware that needs a supercomputer to run and does all kinds of things like network monitoring, automation, netflow collecting and OSS/BSS functionality. Which is great but who asked for any of those on a routing platform?”
He makes a few more points about the foibles of modern SR controllers, then explains the ideal SR controller from his perspective. It’s even got a name—Traffic Dictator. Traffic Dictator “is a routing platform, with configuration resembling a router so every network engineer familiar with SR and BGP can intuitively figure out how to use it.” Links to download, read docs, and grab a whitepaper in the article. He’s got a pre-built Containerlab setup, too. - Ethan
“Machine Learning for Network and Cloud Engineers” – Javier Antich - Ryburn.org
https://ryburn.org/2024/06/23/machine-learning-for-network-and-cloud-engineers-javier-antich/
This is a review of a recent book aimed at network and cloud engineers who need to get a handle on machine learning. The review says the book provides a detailed theoretical background on machine learning, but also offers practical applications relevant to network and cloud pros. It also explores essential issues such as data quality, models, and the ethics of ML and AI. The review concludes that the book is “a valuable resource for any network or cloud engineer looking to stay ahead of the curve. It provides a comprehensive yet practical introduction to ML models, equipping readers with the knowledge and skills to leverage its power for network automation.” - Drew
I love the premise of this blog series. Josh looks back at jobs he did in the early days of his career, explains the solution he came up with at the time, and then describes how he might have done it differently today. It’s a useful learning exercise. In this installment, Josh reviews a project to convert a management system for a wireless deployment in a retail environment that had to provide guest Wi-Fi access. - Drew
Palo Alto Networks: A Leader in single-vendor SASE for the second time.
Palo Alto Networks has been named a Leader for the second year in a row in the 2024 Gartner® Magic Quadrant™ for Single-Vendor SASE. Rated highest on Ability to Execute and furthest on Completeness of Vision. Get more details at https://start.paloaltonetworks.com/gartner-sase-mq-2024.html
TECH NEWS 📣
Exclusive: Google-backed software developer GitLab explores sale, sources say - Reuters
https://www.reuters.com/markets/deals/google-backed-software-developer-gitlab-explores-sale-sources-say-2024-07-17/
GitLab, a GitHub competitor that I’ve heard only positive things about, might be up for sale. Potential buyers need access to plenty of capital, as GitLab is reported to be worth about $8B. Datadog is rumored to be interested, but would not go on the record to confirm. Look for an announcement in coming weeks, and hold onto your butts if you’re a GitLab shareholder. - Ethan
Cloudflare reports almost 7% of internet traffic is malicious - ZDNet
https://www.zdnet.com/article/cloudflare-reports-almost-7-percent-of-internet-traffic-is-malicious/
Based on my email, I’d have assumed 70%…but anyway…6.8% of the Internet traffic CloudFlare is reporting on is nasty. Vulnerabilities are being exploited more quickly, often within minutes. There are more zero days in the wild. DDoS attacks continue to grow.
Perhaps most depressing of all? “Finally, about 38% of all HTTP requests processed by Cloudflare are classified as automated bot traffic. Some bots are good and perform a needed service, such as customer service chatbots, or are authorized search engine crawlers. However, as many as 93% of bots are potentially bad.” Sigh. Our boring dystopia marches on. - Ethan
New Fiber Optics Tech Smashes Data Rate Record - IEEE Spectrum
https://spectrum.ieee.org/fiber-optic-cable-record
402Tbps. Yup. Researchers think 600Tbps is the absolutely maximum science would be able to squeeze out of the commercial grade fiber they were using for the test. But commercial grade fiber! That means the stuff already under the ocean might be able to carry these massive amounts of data. - Ethan
CrowdStrike blames a test software bug for that giant global mess it made - The Register
https://www.theregister.com/2024/07/24/crowdstrike_validator_failure/
CrowdStrike released a preliminary review that says a bug in its testing software caused the testing software to miss a bug in a software update that would go on to crash Windows machines around the world. Apparently when it comes to software, you just can’t win. - Drew
FOR THE LULZ 🤣
RESEARCH & RESOURCES 📒
Networking Fundamentals Series - Ed Harmoush via YouTube
https://www.youtube.com/playlist?list=PLIFyRwBY_4bRLmKfP1KnZA6rZbRHtxmXi
Networking instructor Ed Harmoush has been publishing a series about networking fundamentals to YouTube. There are 15 videos in the series so far, most about 10-15 minutes long. Well worth your time to watch or share with a colleague. - Ethan
cake-autorate - The Bufferbloat Community
https://www.bufferbloat.net/projects/bloat/wiki/cake-autorate/
https://github.com/lynxthecat/cake-autorate/tree/master
Cake-autorate keeps CAKE up to date with real-time bandwidth availability. If you’re thinking that there’s no point because you have a constant bandwidth available, you’re right. “Cake-autorate is intended for variable bandwidth connections such as LTE, Starlink, and cable modems and is not generally required for use on connections that have a stable, fixed bandwidth.”
Don’t know what CAKE is? I recommend you start by reading about CoDeL and going from there. - Ethan
AutoCon2 Registration Is Open! - Network Automation Forum
https://networkautomation.forum/autocon2
AutoCon2 is coming up fast on November 18-22 in Denver, Colorado and we want to let you know some key dates:Conference Registration is open NOW!
You can get super early bird pricing of only $299 until August 28
Hotel registration is open now - grab a room SOON!
Call for Speakers closes July 31
We already have the most proposals for talks that we've ever had
Workshop Registration opens August 8
We're going to have a great slate of workshop options covering a range of topics in network automation and orchestration
Note that it's a separate event conveniently preceding AC2
The Full AC2 Conference Agenda will be published by September 9
NAF is a watering hole - a place where we can have harmonious collaboration in network automation: the practice of network automation, orchestration, observability, AI tooling, education, process and standards, and more. Come hear what your peers are doing in their networks (on the stage and in the hallways), what solution providers are bringing to the table, what's happening with open source, and all things network automation.AutoCon is THE Forum for Network Automation. See you in Denver!
INDUSTRY BLOGS & VENDOR ANNOUNCEMENTS 💬
Preliminary Post Incident Review (PIR): Content Configuration Update Impacting the Falcon Sensor and the Windows Operating System (BSOD) - CrowdStrike
https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/
CrowdStrike begins the painful, public process of explaining why what should have been a routine update resulted in what’s being called the largest IT outage in history. There is a substantial amount of technical detail here, if such things interest you. - Ethan
July 2024 Update on Instability Reports on Intel Core 13th and 14th Gen Desktop Processors - Intel Support Community
https://community.intel.com/t5/Processors/July-2024-Update-on-Instability-Reports-on-Intel-Core-13th-and/m-p/1617113#M74792
TL;DR. Voltage issues. Fixable with microcode. - Ethan
SonicWall Report Details Exponential Increase in Overall Cyberattacks; Reveals Potential Revenue Risk for Businesses - PR Newswire
https://blog.sonicwall.com/en-us/2024/07/sonicwall-2024-mid-year-cyber-threat-report-iot-madness-powershell-problems-and-more/
SonicWall has released a threat report looking at the first half of 2024. Highlights? From the company blog: “Business email compromise (BEC) attacks are on the rise, supply chain attacks and the risks associated with them are increasing and IoT malware is becoming more and more of an issue.”
I was interested to see that SonicWall has adjusted one of its metrics. Regarding firewalls, the company says it used to count every hit against a firewall, but given the volume of attacks it decided that wasn’t a very descriptive metric. Instead, the company is now counting “the number of hours a firewall is under attack rather than every single hit.” SonicWall compares it to weather reporting. Instead of counting and reporting every drop of rain, it’s telling you that it rained hard in the afternoon. SonicWall says this change “is more consistent, simplifies comparisons and data interpretations, and overall significantly improves the way we’re analyzing and reporting telemetry data.” You can download the full report here in exchange for contact details. - Drew
TOO MANY LINKS WOULD NEVER BE ENOUGH 🐳
Playing With Domain-Specific LLMs - bohcay’s Substack
Is it hot enough for you yet? - Bryan Ward
Adding Arista Switch to CML - Daniel’s Networking Blog
Transforming IT Operations - The Rise of Infrastructure Automation Consulting - EverythingShouldBeVirtual
Intel - Is it an IPU or a DPU or what? - HowFunky
Pluralsight Problems - Ned In The Cloud
Understanding Secret Keys: A Simple Explanation - Practical Networking on YouTube
A Multifaceted Look at Starlink Performance - The Good, the Bad and the Ugly - RIPE Labs