SD-WAN in a Work from Anywhere World

SD-WAN vs VPN

Connecting remote users into corporate IT resources has never been trivial. With the shift to work from remote already well underway for many before 2020, the global COVID-19 pandemic catalyzed the transition dramatically. The change happened so fast, many organizations struggled to get their remote users connected when working from home as it became a requirement. Remote access to an organization’s applications traditionally necessitates creation of a virtual private network (VPN) connection of some kind for these users to safely encrypt and secure access to sensitive data they need to do their jobs. We witnessed first hand our clients scaling this VPN demand up and having firewalls crashing from exceeding capacity, mad scrambles to upgrade licensing for more users, hastily getting software installed on laptops and needs to add cloud based VPN termination options to elastically expand capacity. It certainly was an eye opening experience for many that their infrastructure simply was not prepared to scale up quickly when they needed it most.

So how can we accommodate this sort of thing better in the future? With a bit of hindsight and introspection on what we just went through, are we as an industry thinking about remote connectivity in the correct way? Managing and securing remote connectivity for users can be achieved a number of ways, each of which having their own trade offs. There are some very progressive and forward looking models that are built with a “cloud first” mindset that do not require VPN connectivity. That said, I would say those are outliers in the corporate world today. Most organizations have a legacy application or other need to leverage a VPN for connectivity into their IT resources. Let’s explore a couple of methods of using VPNs for corporate connectivity and the compromises for each.

Endpoint Agent/Client Software

Many folks leverage an approach which involves a software client or agent running on the user device which will establish the VPN to “tunnel” a user’s traffic securely to and from the IT environment. Though very common and popular, is this the best way to connect users is a modern hybrid and multi-cloud environment?

The positive things about this approach:

  • Most organizations have a firewall and the functionality to connect users via VPN is often already baked in.
  • No additional hardware is required outside of the firewall and the endpoint itself.
  • Sometimes VPN software agents inspect traffic before it enters the tunnel so it can be inspected before sent into the corporate network. This puts the security perimeter very close to the user.
  • This is a well understood and a very common deployment scenario.
  • User identity, and therefore access policy, is known and can be managed by nature of the user logging into the machine the agent is hosted on.

The negative things about this approach:

  • Certain devices can’t run endpoint software (iOS, Android, unsupported and older operating systems, etc) so using phones, tablets and older computers may not be an option.
  • Agents are putting the burden of inspecting, securing and reporting network & application usage on the user device which may consume compute resources taking away from the user experience.
  • Managing permissions and controls on the user device can be difficult and time consuming.
  • Hybrid, multi-cloud and SaaS network connectivity needs become complex to manage and secure.
  • Additional licensing costs may be required to add users.
  • Lack of network visibility without additional tools on the device.

So we get VPN connectivity included with components we may already have, but there are some reasons why it is not a one-size-fits all model. Let’s contrast it with an alternate approach.

SD-WAN Network Appliance

Another approach for remote users to access IT resources is leveraging an actual network appliance to terminate the WAN connectivity and then connect the user device via Ethernet or Wifi. Many platforms have SD-WAN capabilities today, not to mention some security features baked in so let’s assume we are working with these modern edge appliance features for the sake of our argument.

The positive things about this approach:

The negative things about this approach:

  • Additional devices to install, manage and support
  • Additional hardware costs
  • Depending on the platform, additional licensing costs
  • Without agent, cannot validate state of end user device before attempting to connect
  • More planning and coordination with users to get network connected vs getting on Wifi/Ethernet and firing up a VPN client.

In conclusion, which is better?

So which is perferable? The age old “it depends” applies. In most cases, my design preference would be the SD-WAN network appliance. I may be biased as a network practitioner, but I predict we will find many moving to a network based approach for work from anywhere. As computing capabilities evolve and can be supported in smaller packages, remote users will have a little “puck” sized appliance that will give them access to network resources.

My key reasons for this are:

  • Lack of requirements for a software agent allows for user device independence.
  • No need to manage software on user machines i.e. no dealing with OS permissions issues, no keeping agent software up to date, no user performance impact from agent, etc.
  • More network and application visibility/telemetry opportunities with network appliance that stream this info not to mention the ability to easily issue packet capture at the edge on network appliance.
  • Though you will have some additional costs to install and manage the hardware, there are great options to automate and orchestrate this control, not to mention things like zero touch provisioning to stand them up. It can be argued deployment can happen more rapidly.
  • In the future, the potential to install apps at the edge. Examples would be synthetic application monitoring and measurement platforms, application optimization, data synchronization, etc.
  • WAN & application optimization tools to clean up performance are typically baked in to correct problems like packet loss, jitter and packet loss on the fly.
  • Managing routing, access control, content filtering and other things that we typically depend on the network devices we use today are well known and easier to manage on network appliances.

What do you think? Which approach seems better to you for remote connectivity, agent software or SD-WAN? Please comment here or on social media with you thoughts. As always, thanks for reading and I certainly would appreciate any input you may have!

Juniper Gets “Misty” at Networking Field Day 24

The Juniper Enterprise Portfolio

I had the honor of participating as a Networking Field Day 24 (NFD24) delegate In February 2020. First off I must say, this was an incredible experience for me. I’ve been watching NFD recordings for years to learn everything I can about vendor platforms before and even after working with them. If you haven’t seen any NFD videos before, I implore you to go check out the ones pertaining to vendors you are interested in. The whole presenter and delegate approach is unique and a great way to get answers to questions about platforms you might have asked yourself in a network vendor’s product presentation. It’s like being a fly on the wall while executives and product managers present their solutions. You get the story right from the people who crafted it. Seriously, be sure to check them out.

The first presentation of NFD24 was by Juniper Networks. I’ve been a Juniper customer in a past professional life so it was great to catch up on the latest they have to offer. It seems the biggest news is how Juniper is coalescing management of all of their enterprise networking products into their March 2019 Mist acquisition and are “Mistifying” many of their traditional products like EX & SRX models. This approach seems to be a trend with many networking vendors moving controllers and network visibility from on-premises into a cloud based platform, which Mist AI platform happens to be. The aim of standardizing on Mist as the common platform of choice for Juniper is many factored but the key for network operators is, having all the insights and visibility you need for your network in one place.

Juniper had a couple of other acquisitions that closed recently which were 128 Technologies for their innovative WAN tech and Apstra for their vendor agnostic intent-based & automation platform for datacenter networks. The pace at which they have been able to integrate these companies was impressive and they shared more info on this during their presentation.

Let’s dive in to see all the cool things that Juniper had to share.

Marvis VNA

Juniper introduced Marvis Virtual Network Assistant a feature on the Mist platform which is a very cool interface that enables insights with common language queries against Mist AI. The AIOps space is heating up and it seems like Juniper’s Marvis offering is a very interesting entrant. The core of what Marvis will provide is a place to ask plain english questions about the infrastructure like “Where is MAC XX:XX:XX:XX:XX:XX?” or “What happened at 12:51a?” and Marvis will return relevant data for an operator based on their query. This is certainly very powerful and incredibly useful as the average network engineering team workload increases and can provide very powerful tools for even junior or intermediate technicians to leverage to retrieve answers quickly.

The New EX4400

Juniper launched their new EX4400 Campus Switch which looks pretty beastly. Of course it integrates with Mist and comes in a number of form factors and features mGig, PoE++, GBP and flow based telemetry support. What I found interesting is that it’s another campus switch that supports EVPN-VXLAN which as mentioned in a previous post from 2019, seems to be a trend in campus switches. If you’re in the market for high capacity campus switching, this certainly could be one to add to the list to check out.

128T Acquisition/Integration

Juniper acquired 128 Technology late last year and wasted no time integrating it with Mist. They made the claim on their presentation that it only took them 6 weeks to ingest telemetry data from the 128T platform and make it usable in the Mist interface. Impressive! The real strength of the 128T solution is what’s called Session Smart Routing, which is essentially a session based solution that eliminates the needs for bandwidth-costly overlays (aka tunnels) typically leveraged in SD-WAN. Check out this deep dive on how it works from The Packet Pushers. This certainly is a differentiator for Juniper in the WAN space and something I’ll certainly be paying attention to as time goes on!

Apstra Acquisition/Integration

Juniper also closed the Apstra acquisition earlier this year. I really anticipated Juniper locking the product down to only working with their own platforms but was delighted to hear they intend to keep Apstra open and multi-vendor. Their take was, it is way harder to change it to single vendor from multi-vendor because being that way is baked into it’s core. I’m sure many existing Apstra customers are relieved to hear that as it is an impressive platform that really makes managing large datacenter networking infrastructures easy. The demo the Apstra team showed off was deploying EVPN/VXLAN DCI between two data centers in seconds. We were coming to the end of the time allotted for their presentation but what I saw was very impressive and happened incredibly fast for all of the logic that was required underneath! For those managing large Cisco, Arista, Juniper, Cumulus or SONiC datacenter network infrastructures, Apstra could be a fantastic fit for you.

Conclusion

I’ll be honest with you, I’ve been worried about the long term strategy for Juniper in recent years. Though a stalwart vendor in the service provider space, their grip on the enterprise market seemed to be slipping for a while. I believe that has changed with these acquisitions that have reinvigorated Juniper’s ambitions in the enterprise. These moves were abrupt and extreme but necessary to compete, and I believe innovate, in the space. It will be very exciting to see where Juniper takes things in the coming months.

DISCLAIMER: I was fortunate to participate in Network Field Day 24 as a delegate by Gestalt IT who paid for snacks and sent me some cool swag from the participants. I did not receive any compensation to attend this event and I am under no obligation whatsoever to write any content related. The contents of these blog posts represent my personal opinions about the products and solutions presented during NFD.

Starlink: First Impressions

Starlink

So my Starlink kit is finally here! A number of folks have asked for first impressions so I’m going to break it down. Long story short, it’s a breeze to setup but operationally, definitely a beta service. Let’s explore.

The Unboxing

I was first struck how efficiently everything is packaged. The first thing you see is the 3 panel pictogram of how to set this all up right on top. You know what, it pretty much was this easy. When you lift this page and the top molded plastic panel that holds everything in place up, you get to the goods. The form fitting molded plastic on the top and bottom holds the kit in very snug, it’s really well executed. In the box:

  • Starlink Dish with attached mast
  • Ground base to attached to dish mast
  • Starlink Power over Ethernet (PoE) Injector – Model UTP-201S – Output towards dish maxes out at 90W (x2), output towards router maxes out at 17W. Total wattage this guy can produce is 180W.
  • Starlink Router – Model UTR-201 – PoE input 10W – Has built in 802.11a/b/c/g/ac Wifi over 2.4Ghz & 5Ghz and “AUX” 10/100/1000 Ethernet port.
  • Pre-connected cables, 100 foot black cable for dish to PoE injector, 6 foot white for PoE injector to router. The cables were already plugged in.

The Setup

With everything pre-cabled, assembly really comes down to:

  • Snap the dish and it’s attached mast into the base
  • Plug PoE brick into the wall
  • Connect the pre-terminated and weather proofed black cable from the dish into it’s black color coded port on the PoE brick
  • Plug the white cable already hanging off the included router into it’s color coded port on the PoE brick.

From a physical standpoint, that’s all you really have to do! A lot of time was obviously spent on making this very easy to deploy. Mission accomplished. I literally got everything up and running within 10 minutes. Once plugged in, the dish points straight up at the sky and with it’s built in motors starts to tune it self to receive the strongest signal possible. If you want to see these motors and the guts of the dish, check out engineer Ken Keiter’s tear down. It’s quite impressive, I highly recommend checking it out. The dish iterated through a few positions and it eventually settled on a position somewhere in the sky NNE of my house.

First thing to do after plugging everything in is to get the Starlink app on iOS or Android. All configuration, control and documentation is really within this app so it’s definitely a requirement. This process is relatively straightforward and is a lot like any other consumer IoT devices you may have picked up recently.

The Starlink UTR-201 router comes with a default SSID which is printed right on it by the “AUX” port on the back of the unit.

The iOS/Android app connects to it over Wifi and adopts the router so now you can adjust some basic settings via the app. Not really much you can configure there other than the wireless SSID, more on that later.

The Operation

Here are some notes on what things look like after we get it all plugged in and up and running. I have to say it’s definitely “better than nothing” as they state. That said, there is room for improvement.

  • Once connecting to the Starlink router via Wifi or wired via the AUX port, you will be DHCP’d a 192.168.1.x/24 address. This is not optional and there is no way to reconfigure different addressing or other DHCP options that I can find.
  • There is no management interface to the router and the options are very limited. There is a rather nice statistics dashboard you can see through the app or surfing in your browser to 192.168.100.1 when connected to it.
  • Your DNS server will be the router at 192.168.1.1 and the search domain is just “lan”.
  • There is no configuring port translations but the router is running Universal Plug n Play (UPnP, see below in the next section) so maybe there will be plans for that later?
  • The WAN interface on the router is behind Carrier Grade Network Address Translation (CG-NAT). More on this later, but this will make port forwarding impossible and having a public IP address for specific applications (things like old school IPSEC VPNs or accessing your own server directly) not currently possible.

How’s the latency? Pretty good actually.

jg-mbp:~ jason$ ping 4.2.2.2
PING 4.2.2.2 (4.2.2.2): 56 data bytes
64 bytes from 4.2.2.2: icmp_seq=0 ttl=58 time=37.909 ms
64 bytes from 4.2.2.2: icmp_seq=1 ttl=58 time=43.383 ms
64 bytes from 4.2.2.2: icmp_seq=2 ttl=58 time=40.946 ms
64 bytes from 4.2.2.2: icmp_seq=3 ttl=58 time=39.343 ms
64 bytes from 4.2.2.2: icmp_seq=4 ttl=58 time=37.811 ms
^C
--- 4.2.2.2 ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 37.811/39.878/43.383/2.091 ms

This in the same neighborhood as RTTs over my Spectrum connection which is impressive! This is my Spectrum RTT to the same address.

jason@rtr01-jghome:~$ ping 4.2.2.2
PING 4.2.2.2 (4.2.2.2) 56(84) bytes of data.
64 bytes from 4.2.2.2: icmp_req=1 ttl=53 time=34.4 ms
64 bytes from 4.2.2.2: icmp_req=2 ttl=53 time=32.2 ms
64 bytes from 4.2.2.2: icmp_req=3 ttl=53 time=31.0 ms
64 bytes from 4.2.2.2: icmp_req=4 ttl=53 time=29.5 ms
64 bytes from 4.2.2.2: icmp_req=5 ttl=53 time=34.3 ms
^C
--- 4.2.2.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 29.513/32.314/34.454/1.910 ms

Next question, how much bandwidth are we getting? This is typically in the neighborhood of around 70Mbps down / 10Mbps up. It’s good, but not great. I was most surprised at the upload bandwidth, I wasn’t expecting to get this much.

Now as far as stability goes, that’s all over the board. Here’s a My Traceroute (MTR) which trace routes the path then pings the hops repeatedly. I let it cycle through 100 times here.

Oof. That’s not pretty. Standard deviation is up there, there’s 3% packet loss all the way through and we are getting upwards of 300ms RTTs right out of the gate. More detail will be below in the next section after I plug it into my VMware SD-WAN appliance.

The Geekier Stuff

The previous sections were the basics that most people want to see. This section will be more of the fun details I observed while playing around.

One thing that I thought was interesting was the router’s hostname resolved via DNS off itself.

jg-mbp:~ jason$ host 192.168.1.1
1.1.168.192.in-addr.arpa domain name pointer OpenWrt.lan.

So it looks like it’s based on OpenWRT. To be honest, this is not uncommon and I know of many other commercial products based on this as well.

I tried to see if there is a web management interface on the router but no such luck. Here’s what a port scan looks like.

Starting Nmap 7.91 ( https://nmap.org ) at 2021-02-28 11:33 EST
Nmap scan report for 192.168.1.1
Host is up (0.24s latency).
Not shown: 994 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
53/tcp   open  domain
80/tcp   open  http
5000/tcp open  upnp
9000/tcp open  cslistener
9001/tcp open  tor-orport

When you go to port 80 on it, it just redirects you to https://www.starlink.com. Boring!

The router is listening for DNS queries on port 53 and answering them pretty quickly. It appears to be proxying and caching DNS entries which certainly helps speed things up. It’s all about optimizing performance where you can when delivering internet from space and I think this was a smart way to go. Here’s a dig query against the router for a cached entry vs against an internet name server. 2ms vs 63ms is a big improvement!

jg-mbp:~ jason$ dig google.com @192.168.1.1

; <<>> DiG 9.10.6 <<>> google.com @192.168.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9561
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		212	IN	A	142.250.64.110

;; Query time: 2 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Mon Mar 01 20:40:03 EST 2021
;; MSG SIZE  rcvd: 55

jg-mbp:~ jason$ dig google.com @8.8.8.8

; <<>> DiG 9.10.6 <<>> google.com @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41035
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		117	IN	A	142.250.64.110

;; Query time: 63 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mon Mar 01 20:40:10 EST 2021
;; MSG SIZE  rcvd: 55

TCP/22 aka SSH is open! But good luck getting in there. Their SSH server uses key based instead of user based authentication which I have to say is refreshing! Definitely a step in the right direction when it comes to IoT device security.

jg-mbp:~ jason$ ssh admin@192.168.1.1
The authenticity of host '192.168.1.1 (192.168.1.1)' can't be established.
RSA key fingerprint is SHA256:owxzwYXb/xsrqqDmR1YkIaAIR6AS1t+iwE0mMvoymYM.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.1.1' (RSA) to the list of known hosts.
admin@192.168.1.1: Permission denied (publickey).

Universal Plug n Play (UPnP) is running on TCP/5000. Perhaps this is for future application? If you think you know what this is for outside of the standard UPnP application, let me know. Seems weird to have it when there’s another layer of CG-NAT beyond it. Also curious about TCP/9000 and TCP/9001. The most common uses for these are PHP-FPM and The Onion Router (ToR) but I doubt that’s what they’re for. If anyone has ideas I’m all ears.

I wanted to see what happens when you bypass the Starlink router and plug the dish right into a different device instead. It turns out, this works! You get an RFC6598 IP address which is what you do for CG-NAT. Makes sense when you are working with very little IPv4 space and you need to conserve as much as you can.

While in the SD-WAN Platform, let’s check out how well it thinks Starlink would do for real time applications like voice compared to my Spectrum circuit.

Hmmm… have a little ways to go there it seems. There are a lot of instances of packet loss, jitter, high latency and just plain no connection.

How about IPv6? Turns out, not ready yet.

One thing I really love is the support section. It has some really great insights and commonly asked questions like this one in it.

So that’s it for now! As mentioned before, an amazingly simple setup experience and it really is a remarkable offering considering they are blazing new territory here. I think the services will only improve over time and you will see greater stability and performance with each software update/improvement SpaceX makes. That said, this is usable and much better than many of the alternatives those in the boonies suffer today. If you would like to read my thoughts on why I think LEO satellite internet access is more significant in rural areas than 5G, check that out here.

I’m going to write a follow up but wanted to get something out for folks to check out. Please do contact me for things you would like for me to try or tell you about! I would absolutely love to share my experiences and learn more by answering your questions.

Security Discussion on The WAN Manager Podcast

Had a great discussion with Greg Bryan, Senior Manager of Enterprise Research @ Telegeography who is a global connectivity analyst firm. Thanks to Greg for having me on!

Check it out here:

https://blog.telegeography.com/wan-manager-podcast-network-security-jason-gintert

You can also find it on Apple | Google | Stitcher | TuneIn | Podbean | RSS

Starlink Experiment Underway!

Ohio Starlink beta signups, check your inbox! I signed up for it last spring and have been anxiously anticipating the email inviting me in. Well today is the day and shortly, I’ll be enjoying my internet access from space. I’m super excited to test this out and add more resilience and capacity to my access. I’ll definitely post updates detailing the experiment.

The signup was very straightforward, an email arrived at 4:50p today inviting me to check availability. I clicked on the link, put in my email and physical address that confirmed availability. I was then taken to a page to put in my personal info, a credit card number and clicked submit. A confirmation email arrived for my payment and had a link to sign into the account. I’m now able to log in to track the status of my order.

Looks like it’s going to be about 2-4 weeks before the kit arrives. I’ve downloaded the app which you can use to check for obstructions in the place where you would like to install the dish but it’s dark now so I’ll check that out tomorrow. There is also a step by step guided installation portion of the app that I’ll hold off on until the dish comes.

So I guess, now we wait! I’ll have more updates as they become available. Definitely excited to test out what I really feel will be the future of connectivity!

SDN’s Promise Lives On

I’m not sure if anyone else remembers the early days of OpenFlow and Software Defined Networking (SDN), but it was going to change networking for good. The idea was being able to embed programmable forwarding and filtering policy directly into your switches. You would then use a controller which spoke to the switches via the OpenFlow protocol to distribute network policy like you would find in firewalls & load balancers, right inside your switch. The problem was that OpenFlow had scalability issues, was tough to integrate with switch pipelines and just never saw the enterprise adoption that many expected. 10 years after the first drafts of OpenFlow, we are seeing some of the promise of SDN & OpenFlow’s key ideals back again for consumption in different forms.

Out with the DumbNICs, In with the SmartNICs

Some big names in the networking such as Broadcom, Mellanox & Xilinix are beginning to tout SmartNIC offerings which they have created at the behest of the cloud scaler crowd the likes of Amazon, Microsoft and Google. These organizations are looking ways to offload the network processing away from the general purpose CPUs which power their cloud offerings over to specialized SmartNICs that take it on. A nice side effect of putting the packet mojo into the SmartNIC is forwarding and policy capabilities outside of what a standard NIC can accomplish using Field Programmable Gate Arrays (FPGA) or SOC (system on a chip) to provide extended functionality. This makes for some interesting distributed policy and monitoring capabilities such as telemetry, encryption, filtering, stateful IPS/IDS, load balancing and compression right at the host side edge of the network. This is tremendously powerful in that it presents a lot of options for large operators by distributing and accelerating workloads while preserving the compute infrastructure for what it should be doing: computing.

Comparison of NIC capabilities care of Mellanox

A “SmartToR” Edge

Late last year, Broadcom announced their Trident based SmartToR offerings to take similar functionality you would find in SmartNICs to the Top of Rack (ToR) switch, the network side edge of the datacenter. This makes possible some of those same features in the SmartNIC and pushes them down to the network edge before the packets get to the hosts. Examples of potential applications could be for traffic you wish to manipulate before it actually gets to the hosts. So you can introduce load balancers to distribute inbound sessions, firewalls to stop DDoS from melting the hosts, Network Address Translation (NAT) to translate public to private addresses, penultimate decryption of traffic and all with switch Application Specific Integrated Circuits ASICs for blazing forwarding performance. Leveraging really fast switch ASICs to analyze and control traffic gives you orders of magnitude greater scale. Analysts believe that the SmartToR concept will be more suitable for enterprise versus the SmartNIC approach because of the preference for a “bare metal” appliance in enterprise environments.

In a forwarding battle of CPU vs SmartToR, I’m betting on the SmartTor

What’s the deal with P4?

So after OpenFlow, a consortium of network luminaries including some of the smart people responsible for OpenFlow decided to try something new. Programming Protocol-independent Packet Processors (or P4 for short) was created with a different type of network architecture in mind. Instead of speaking to fixed-function switches who have locked in functionality baked right into their ASICs which OpenFlow leverages, P4 was meant to be a programming language used on a programmable chip (aka PISA or Protocol Independent Switch Architecture). The idea was instead of letting the switch chip makers dictate what features and functions were available, one could potentially write their own rules right into the switch for operation. Want write a custom IPv4 pipeline to support a specific need in your backbone? Go for it. Want to write your own Internet Protocol, IPv[Your Name Here]? Knock yourself out. It was a fully extensible and programmable chip so the sky was the limit. Though powerful, this is inaccessible for the average organization and wielded more deftly in the hands of cloud scale companies and network vendors. P4 is being leveraged to write some cool network applications, but has limited application in the enterprise.

So is SDN Coming Back?

SDN never really went away. Many cloud-scalers and academic networks continued on with their SDN efforts with and without OpenFlow long after the rest of the industry. The “one size fits all” approach of OpenFlow just was not a fit for most of the market but many of the basic tenets appear to live on in different forms. From the looks of trends like SmartNICs, P4 and switches the likes of Broadcom’s SmartTor, the idea lives on that you can embed many of today’s disparate edge network functions right into the network itself. Taking what has traditionally been separate network appliances scattered throughout the network and embedding them inline presents many advantages with regard to capacity, visibility and control. As with anything, it takes a few tries to get things right but the promise of a fully programmable network within which you can directly embed key network functions is too great to go away.

Let me know what you think, feel free to comment or engage on LinkedIn or Twitter. To see what other things I’m up to, check out my Now page. Thanks for reading!

Starlink vs. 5G

Starlink > 5G

There has been a lot of interesting developments in the mobile/wireless connectivity world as of late. Despite being told for many years 5G will change our lives (seriously, for a really long time now), as it finally comes to market it seems there are other technologies that might steal a little bit of that 5G thunder. The more I read about SpaceX’s Starlink or the other low earth orbit (LEO) satellite services like OneWeb, Telesat & Amazon, the more they seem to have the potential to make a bigger impact than 5G. Low earth orbit satellite connectivity solutions appear to be solving what seem like more pressing remote and limited connectivity problems. Don’t get me wrong, 5G will likely be a great incremental step forward in the places where we already have 4G/LTE connectivity today but it really won’t do much to help those who are so far off the beaten path that they don’t have good access. Being subscribed to the Reddit group /r/starlink, you see some pretty amazing reviews from people who up until now, haven’t had many options for connectivity. In particular, if you live in remote parts of the world which Starlink is currently servicing, there are now some pretty amazing connectivity you never had before.

Living and Working in the Boonies

There are a lot of niceities to living in very rural areas for those that enjoy the country life. Large plots of land, lots of privacy, no hustle, nor bustle. That said, ease of access to high speed internet access is not a benefit you often enjoy in the sticks. If you are fortunate enough to have high speed access in very rural areas, options are limited to one or two overpriced providers that have a monopoly. These providers also a lot of infrastructure costs to cover for relative few addressable customers which goes for remote residential and business customers as well. There usually is little in the way of good wireless 4G/LTE coverage for the same reasons as the wireline guys because it just doesn’t pay to put the kind of dollars into building the infrastructure and backhauling fiber from towers which will reach only a handful homes and businesses. With that, there are huge swaths of extremely rural areas with little to no access at all that would potentially never make financial sense to reach with terrestrial options. For some, getting away from Internet access may be by design but for others it’s never ending disappointment of crappy, overpriced connectivity options. Low earth orbit satellite services can cover these areas very well and provide connectivity to areas which would never be on terrestrial wireline or wireless carriers otherwise. There are countless people and organizations that can finally know the convenience of effective, low latency (~50ms) & broadband access at 50-200Mbps speeds in these areas. But are the speeds and performance of low earth orbit access enough compared to the speeds of 5G?

How Much Bandwidth is Enough?

Maybe I’m getting too old to carry a geek card but but I often wonder, how fast does Internet access really need to be? Sure, faster is always better but how much bandwidth does one need before there is no real discernible difference between a few hundred megabits per second and getting up into multi-gigabits per second? It’s kind of like going from HD resolutions at 1080P up to Ultra HD resolutions at 4K or even 8K. I personally can’t tell the difference on the size TVs that I buy, which are around 50” or so. Another analogy might be in computing such as the difference between a 3.3Ghz six core or 3.8Ghz eight core processor. I understand there’s a difference but do the applications I use day in, day out really show a significant performance increase? Will multi-gigabit speeds really make a noticeable difference for me or the average user? For the enthusiast and those living on the cutting edge of technology, sure, they’ll bust out their benchmarking tools to compare and find ways to use all of that throughput. Most users like myself are perfectly content with around 100-250Mbps of bandwidth.

What Connectivity Problems Need Solving?

Once Starlink and other low earth orbit satellite services like it really start chugging, they will solve connectivity issues for many of the underserved. Contrast that with 5G as an incremental performance increase for those who already have 4G/LTE access today, which is great but in my mind, less significant. Connecting the unconnected or ”underconnected” with more bandwidth is far more interesting than just souping up existing connectivity that is pretty darn good as it is. I am certainly long on the promise low earth orbit access brings for global connectivity landscape and think this will be a hugely disruptive. I only wish I could buy stock in Space X to support and share in the success of their mission!

What do you think?

Is your Internet connectivity REALLY redundant?

Outages

The CenturyLink/Level3 internet outage on August 30th, 2020 got a lot of network engineers thinking about internet reachability and the ways things can go wrong. The way this particular failure played out was unique and definitely gave us all a lot to consider in the way of oddball failure scenarios. Problems started for CenturyLink/Level3 when a BGP Flowspec announcement came from a datacenter in Mississauga, Ontario in Canada. Flowspec, a security mechanism within BGP, is commonly used to filter large volumetric distributed denial of service (DDoS) attacks within a network. The root cause of this particular issue appears to be operator error by way of a CenturyLink engineer being allowed to put a wildcard entry into a Flowspec command to block such an attack. This misformatted entry caused many more IP addresses than intended to be filtered, wreaking havoc on the CenturyLink/Level3 backbone within Autonomous System (AS) 3356. BGP sessions were torn down by the rule because of filtering across the backbone causing instability and reachability issues throughout.

One very interesting bit about how things failed was what happened when other networks tried to shutdown their BGP sessions to AS 3356. CenturyLink/Level3 didn’t stop propagating prefixes/IP address blocks even after the BGP sessions were shut down. This made the BGP speakers still connected to AS 3356 think it was a valid path to reach said prefixes/IP addresses but it was not any longer. This traffic was then “blackholed” within the CenturyLink/Level3 backbone because there was no longer an exit point to reach the IP addresses. So not only could you not use the backbone during the disruption, the failure actually could have prevented those who proactively disconnected from CenturyLink/Level3 to be able to utilize alternate paths they might be connected to.

So a question comes to mind of many network engineers examining the post-mortem of this event: How can I can I make sure my network is not affected if this happens again? There are a few things that come to mind as items to take into consideration:

SD-WAN – Now mainstream and very mature, SD-WAN is a fantastic way to overcome connectivity issues over the Internet. Because probes are sent periodically to measure path performance, the right SD-WAN solution could route around performance problems on a network. An SD-WAN overlay alone can’t resolve every issue but combined with some of the other recommendations here, certainly gives you greater resilience.

Autonomous System Diversity – When designing internet connectivity resilience, the goal is to make the links you have as independent from one another as you can. The autonomous system paths of the providers you select is important to examine to be sure they do not depend on one another for transit. A great tool to assist with this is CAIDA’s ASRank which is helpful to to see ASN relationships with one another. Take a look at the ASN of the providers you are considering to see their relationship to one another. In particular, you likely want to avoid the two ASes having a “customer” or “provider” relationship. Ideally, you’ll want them to be a “peer”. Unfortunately that doesn’t 100% guarantee you won’t be affected by something like what happened on August 30th to AS3356 basically still advertising and blackholing but it will get you about as close as you can get to the ASNs not having inter-dependence on one another and have a better chance of survivability.

Three Connections or More – Many with redundant Internet connections assume two connections are enough. I would contend that having a third connection, even if it’s a backup only connection via 4G/5G over a wireless carrier, can save your bacon if the other two carriers are affected by the same outage.

IXPs, CXPs and Cloud Direct Connections – You may want to consider peering into one of the following:

  • Internet Exchange Point (IXP) – You’ll find IXPs all over the world as a means to inexpensively peer networks directly in a multilateral or bilateral peering arrangement. With multilateral peering, you connect to a route server with one BGP peering session then send and receive all routes with anyone connected to the route server. Bilateral peering is a direct BGP peering relationship with another entity on the exchange. These allow a network to directly connect to regional network connections without the need for transit saving money, latency and improving overall performance. Quick plug: I work with the Ohio IX so if you’re in Ohio, I highly recommend checking them out.
  • Cloud Exchange Points (CXP) or Direct Cloud Connections – As the public cloud becomes more important to IT infrastructures, finding a way to stay directly connected to these resources becomes critical. Like connecting to an IXP, connecting to a CXP or Direct Cloud Connection to get to key cloud providers is another opportunity to not just improve redundancy but performance as well.

In closing, it’s difficult to plan for every type of network failure that can occur. This most recent CenturyLink/Level3 outage was one for the books, that’s for sure. All we can do as network engineers is learn from it and strive to build better networks from the lessons we take away.

Thanks for reading! If you’re an Ohio network engineer, be sure to check out a couple of organizations I’m involved with: (OH)NUG and Ohio IX. I might be a little biased but feel they are great resources right in our backyard!

What do YOU want from SD-WAN?

There have been a lot of discussions lately about the maturation of the SD-WAN market. Much of what I’ve read and heard is that SD-WAN has met its initial promise:

  • Improve performance over the WAN
  • Make better use of bandwidth across disparate circuits
  • Give us a central controller for management/visibility
  • In some cases, save a little money.

So is that it? Are we done? The Network Collective podcast had a great conversation about the Future of SD-WAN. There have been some really great Packet Pushers podcast episodes on this topic lately too. These really got me thinking about what else we can really ask for from SD-WAN and here are some thoughts.

My Wish List

  • Deeper Security Integrations – There’s no doubt security considerations are top of mind right now. With how rapidly remote work was foisted upon all of us with COVID-19, many needed to figure things out fast. Gartner coined the term Secure Access Service Edge (SASE) for the amalgamation of network technologies that make for secure, borderless network access. The current state of the world will certainly make the adoption of SASE much more rapid. I would like to see many of the “pure play” SD-WAN providers adapt and add more native security features to their products. Another option is to create some tight partnerships with security vendors via “service chaining”. Also great would be Zero Trust Network Access (ZTNA) remote access features baked right into SD-WAN solutions. This means the SD-WAN controller should have visibility and policy control over remote access users.
  • Adaptive Multi-Cloud Topologies – COVID-19 has also emphasized the importance of the cloud in order to make sure remote users still have access to the applications and resources they need. A lot of organizations are finding their cloud providers are not “one size fits all” when it comes to certain applications. This makes for some complex network designs and integrations to make it all work together. Optimizing performance across clouds natively will need to be a part of the SD-WAN story moving forward. You are seeing these problems solved today some interesting multi-cloud solutions like Alkira and Aviatrix. I firmly believe SD-WAN vendors are going to need to start building some of this to deal with it.
  • Application Performance VisibilityAIOps is helping IT operations keep up with the rapid pace of change but we’ll start to need this level of smarts in the network to the user and application level. This will help network operators quickly identify network related application performance issues using ML and AI to break down in simple terms. With the introduction of these features, network engineers will be able to quickly see what is going on and more rapidly remediate said issues.

If you were an SD-WAN Product Manager…

These are a few of my ideas but I would like to hear from you. What do you want to see from SD-WAN? Let’s say you are the product manager, it’s up to you to add features that everybody needs but do not have today. Please comment or reply to the post on social to share your thoughts!

Ohio Internet Exchange (IX)

In March 2020, I volunteered to become a part of the Ohio Internet Exchange aka Ohio IX Technical Steering Committee. If you haven’t heard of the Ohio IX before it is an Internet exchange (IX), also known as an Internet exchange point (IXP) or peering point. IX/IXPs allow Internet transport carriers, Internet service providers (ISPs), mobile and content providers, and other organizations with great connectivity needs to exchange Internet traffic inexpensively through a common switch fabric usually on a settlement-free basis (i.e. no usage charges). Ohio IX reduces the portion of an organization’s Internet transit traffic that must be delivered via their upstream transit providers, thereby reducing the average per-bit delivery cost of their service. The increased number of paths learned through Ohio IX improves routing efficiency and fault-tolerance. By connecting to Ohio IX, members can peer with the route servers provided by Ohio IX, or with any other member, provided the members in question have reached a bilateral peering agreement. There are three types of connectivity options into or through the Ohio IX Fabric:

Multilateral Peering: Traffic exchanged directly between members over shared exchange fabric utilizing Ohio IX route servers

Bilateral Peering: Traffic exchanged directly between two members of the exchange over the shared exchange fabric

Direct Access: Connect directly into content you specify.

There are two levels of membership to the Ohio IX, associate ($500/yr) and senior ($1000/yr). Senior membership requires your own Autonomous System Number (ASN) and requires connecting to the exchange with either a 1G or 10G port. 40G and 100G ports are available as well but additional charges apply. Associate members can participate by connecting to the exchange but are not required to and they have limited voting rights.

The work I’ve been doing has been pretty interesting! I have been getting my hands dirty standing up some Linux hosts for management and monitoring, not to mention working with other technical peers to keep the network running. I am new so have a lot to learn and more work to do in order to feel like I’m pulling my weight, but it’s been a great experience thus far!

If you are interested in learning more about the Ohio IX, reach out to me on LinkedIn or Twitter (links for both at the right) or email info@ohioix.org.

I hope to see you as a new member soon!