SD-WAN in a Work from Anywhere World


Connecting remote users into corporate IT resources has never been trivial. With the shift to work from remote already well underway for many before 2020, the global COVID-19 pandemic catalyzed the transition dramatically. The change happened so fast, many organizations struggled to get their remote users connected when working from home as it became a requirement. Remote access to an organization’s applications traditionally necessitates creation of a virtual private network (VPN) connection of some kind for these users to safely encrypt and secure access to sensitive data they need to do their jobs. We witnessed first hand our clients scaling this VPN demand up and having firewalls crashing from exceeding capacity, mad scrambles to upgrade licensing for more users, hastily getting software installed on laptops and needs to add cloud based VPN termination options to elastically expand capacity. It certainly was an eye opening experience for many that their infrastructure simply was not prepared to scale up quickly when they needed it most.

So how can we accommodate this sort of thing better in the future? With a bit of hindsight and introspection on what we just went through, are we as an industry thinking about remote connectivity in the correct way? Managing and securing remote connectivity for users can be achieved a number of ways, each of which having their own trade offs. There are some very progressive and forward looking models that are built with a “cloud first” mindset that do not require VPN connectivity. That said, I would say those are outliers in the corporate world today. Most organizations have a legacy application or other need to leverage a VPN for connectivity into their IT resources. Let’s explore a couple of methods of using VPNs for corporate connectivity and the compromises for each.

Endpoint Agent/Client Software

Many folks leverage an approach which involves a software client or agent running on the user device which will establish the VPN to “tunnel” a user’s traffic securely to and from the IT environment. Though very common and popular, is this the best way to connect users is a modern hybrid and multi-cloud environment?

The positive things about this approach:

  • Most organizations have a firewall and the functionality to connect users via VPN is often already baked in.
  • No additional hardware is required outside of the firewall and the endpoint itself.
  • Sometimes VPN software agents inspect traffic before it enters the tunnel so it can be inspected before sent into the corporate network. This puts the security perimeter very close to the user.
  • This is a well understood and a very common deployment scenario.
  • User identity, and therefore access policy, is known and can be managed by nature of the user logging into the machine the agent is hosted on.

The negative things about this approach:

  • Certain devices can’t run endpoint software (iOS, Android, unsupported and older operating systems, etc) so using phones, tablets and older computers may not be an option.
  • Agents are putting the burden of inspecting, securing and reporting network & application usage on the user device which may consume compute resources taking away from the user experience.
  • Managing permissions and controls on the user device can be difficult and time consuming.
  • Hybrid, multi-cloud and SaaS network connectivity needs become complex to manage and secure.
  • Additional licensing costs may be required to add users.
  • Lack of network visibility without additional tools on the device.

So we get VPN connectivity included with components we may already have, but there are some reasons why it is not a one-size-fits all model. Let’s contrast it with an alternate approach.

SD-WAN Network Appliance

Another approach for remote users to access IT resources is leveraging an actual network appliance to terminate the WAN connectivity and then connect the user device via Ethernet or Wifi. Many platforms have SD-WAN capabilities today, not to mention some security features baked in so let’s assume we are working with these modern edge appliance features for the sake of our argument.

The positive things about this approach:

The negative things about this approach:

  • Additional devices to install, manage and support
  • Additional hardware costs
  • Depending on the platform, additional licensing costs
  • Without agent, cannot validate state of end user device before attempting to connect
  • More planning and coordination with users to get network connected vs getting on Wifi/Ethernet and firing up a VPN client.

In conclusion, which is better?

So which is perferable? The age old “it depends” applies. In most cases, my design preference would be the SD-WAN network appliance. I may be biased as a network practitioner, but I predict we will find many moving to a network based approach for work from anywhere. As computing capabilities evolve and can be supported in smaller packages, remote users will have a little “puck” sized appliance that will give them access to network resources.

My key reasons for this are:

  • Lack of requirements for a software agent allows for user device independence.
  • No need to manage software on user machines i.e. no dealing with OS permissions issues, no keeping agent software up to date, no user performance impact from agent, etc.
  • More network and application visibility/telemetry opportunities with network appliance that stream this info not to mention the ability to easily issue packet capture at the edge on network appliance.
  • Though you will have some additional costs to install and manage the hardware, there are great options to automate and orchestrate this control, not to mention things like zero touch provisioning to stand them up. It can be argued deployment can happen more rapidly.
  • In the future, the potential to install apps at the edge. Examples would be synthetic application monitoring and measurement platforms, application optimization, data synchronization, etc.
  • WAN & application optimization tools to clean up performance are typically baked in to correct problems like packet loss, jitter and packet loss on the fly.
  • Managing routing, access control, content filtering and other things that we typically depend on the network devices we use today are well known and easier to manage on network appliances.

What do you think? Which approach seems better to you for remote connectivity, agent software or SD-WAN? Please comment here or on social media with you thoughts. As always, thanks for reading and I certainly would appreciate any input you may have!

Starlink: First Impressions


So my Starlink kit is finally here! A number of folks have asked for first impressions so I’m going to break it down. Long story short, it’s a breeze to setup but operationally, definitely a beta service. Let’s explore.

The Unboxing

I was first struck how efficiently everything is packaged. The first thing you see is the 3 panel pictogram of how to set this all up right on top. You know what, it pretty much was this easy. When you lift this page and the top molded plastic panel that holds everything in place up, you get to the goods. The form fitting molded plastic on the top and bottom holds the kit in very snug, it’s really well executed. In the box:

  • Starlink Dish with attached mast
  • Ground base to attached to dish mast
  • Starlink Power over Ethernet (PoE) Injector – Model UTP-201S – Output towards dish maxes out at 90W (x2), output towards router maxes out at 17W. Total wattage this guy can produce is 180W.
  • Starlink Router – Model UTR-201 – PoE input 10W – Has built in 802.11a/b/c/g/ac Wifi over 2.4Ghz & 5Ghz and “AUX” 10/100/1000 Ethernet port.
  • Pre-connected cables, 100 foot black cable for dish to PoE injector, 6 foot white for PoE injector to router. The cables were already plugged in.

The Setup

With everything pre-cabled, assembly really comes down to:

  • Snap the dish and it’s attached mast into the base
  • Plug PoE brick into the wall
  • Connect the pre-terminated and weather proofed black cable from the dish into it’s black color coded port on the PoE brick
  • Plug the white cable already hanging off the included router into it’s color coded port on the PoE brick.

From a physical standpoint, that’s all you really have to do! A lot of time was obviously spent on making this very easy to deploy. Mission accomplished. I literally got everything up and running within 10 minutes. Once plugged in, the dish points straight up at the sky and with it’s built in motors starts to tune it self to receive the strongest signal possible. If you want to see these motors and the guts of the dish, check out engineer Ken Keiter’s tear down. It’s quite impressive, I highly recommend checking it out. The dish iterated through a few positions and it eventually settled on a position somewhere in the sky NNE of my house.

First thing to do after plugging everything in is to get the Starlink app on iOS or Android. All configuration, control and documentation is really within this app so it’s definitely a requirement. This process is relatively straightforward and is a lot like any other consumer IoT devices you may have picked up recently.

The Starlink UTR-201 router comes with a default SSID which is printed right on it by the “AUX” port on the back of the unit.

The iOS/Android app connects to it over Wifi and adopts the router so now you can adjust some basic settings via the app. Not really much you can configure there other than the wireless SSID, more on that later.

The Operation

Here are some notes on what things look like after we get it all plugged in and up and running. I have to say it’s definitely “better than nothing” as they state. That said, there is room for improvement.

  • Once connecting to the Starlink router via Wifi or wired via the AUX port, you will be DHCP’d a 192.168.1.x/24 address. This is not optional and there is no way to reconfigure different addressing or other DHCP options that I can find.
  • There is no management interface to the router and the options are very limited. There is a rather nice statistics dashboard you can see through the app or surfing in your browser to when connected to it.
  • Your DNS server will be the router at and the search domain is just “lan”.
  • There is no configuring port translations but the router is running Universal Plug n Play (UPnP, see below in the next section) so maybe there will be plans for that later?
  • The WAN interface on the router is behind Carrier Grade Network Address Translation (CG-NAT). More on this later, but this will make port forwarding impossible and having a public IP address for specific applications (things like old school IPSEC VPNs or accessing your own server directly) not currently possible.

How’s the latency? Pretty good actually.

jg-mbp:~ jason$ ping
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=58 time=37.909 ms
64 bytes from icmp_seq=1 ttl=58 time=43.383 ms
64 bytes from icmp_seq=2 ttl=58 time=40.946 ms
64 bytes from icmp_seq=3 ttl=58 time=39.343 ms
64 bytes from icmp_seq=4 ttl=58 time=37.811 ms
--- ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 37.811/39.878/43.383/2.091 ms

This in the same neighborhood as RTTs over my Spectrum connection which is impressive! This is my Spectrum RTT to the same address.

jason@rtr01-jghome:~$ ping
PING ( 56(84) bytes of data.
64 bytes from icmp_req=1 ttl=53 time=34.4 ms
64 bytes from icmp_req=2 ttl=53 time=32.2 ms
64 bytes from icmp_req=3 ttl=53 time=31.0 ms
64 bytes from icmp_req=4 ttl=53 time=29.5 ms
64 bytes from icmp_req=5 ttl=53 time=34.3 ms
--- ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 29.513/32.314/34.454/1.910 ms

Next question, how much bandwidth are we getting? This is typically in the neighborhood of around 70Mbps down / 10Mbps up. It’s good, but not great. I was most surprised at the upload bandwidth, I wasn’t expecting to get this much.

Now as far as stability goes, that’s all over the board. Here’s a My Traceroute (MTR) which trace routes the path then pings the hops repeatedly. I let it cycle through 100 times here.

Oof. That’s not pretty. Standard deviation is up there, there’s 3% packet loss all the way through and we are getting upwards of 300ms RTTs right out of the gate. More detail will be below in the next section after I plug it into my VMware SD-WAN appliance.

The Geekier Stuff

The previous sections were the basics that most people want to see. This section will be more of the fun details I observed while playing around.

One thing that I thought was interesting was the router’s hostname resolved via DNS off itself.

jg-mbp:~ jason$ host domain name pointer OpenWrt.lan.

So it looks like it’s based on OpenWRT. To be honest, this is not uncommon and I know of many other commercial products based on this as well.

I tried to see if there is a web management interface on the router but no such luck. Here’s what a port scan looks like.

Starting Nmap 7.91 ( ) at 2021-02-28 11:33 EST
Nmap scan report for
Host is up (0.24s latency).
Not shown: 994 closed ports
22/tcp   open  ssh
53/tcp   open  domain
80/tcp   open  http
5000/tcp open  upnp
9000/tcp open  cslistener
9001/tcp open  tor-orport

When you go to port 80 on it, it just redirects you to Boring!

The router is listening for DNS queries on port 53 and answering them pretty quickly. It appears to be proxying and caching DNS entries which certainly helps speed things up. It’s all about optimizing performance where you can when delivering internet from space and I think this was a smart way to go. Here’s a dig query against the router for a cached entry vs against an internet name server. 2ms vs 63ms is a big improvement!

jg-mbp:~ jason$ dig @

; <<>> DiG 9.10.6 <<>> @
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9561
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

; EDNS: version: 0, flags:; udp: 4096
;			IN	A


;; Query time: 2 msec
;; WHEN: Mon Mar 01 20:40:03 EST 2021
;; MSG SIZE  rcvd: 55

jg-mbp:~ jason$ dig @

; <<>> DiG 9.10.6 <<>> @
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41035
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

; EDNS: version: 0, flags:; udp: 512
;			IN	A


;; Query time: 63 msec
;; WHEN: Mon Mar 01 20:40:10 EST 2021
;; MSG SIZE  rcvd: 55

TCP/22 aka SSH is open! But good luck getting in there. Their SSH server uses key based instead of user based authentication which I have to say is refreshing! Definitely a step in the right direction when it comes to IoT device security.

jg-mbp:~ jason$ ssh admin@
The authenticity of host ' (' can't be established.
RSA key fingerprint is SHA256:owxzwYXb/xsrqqDmR1YkIaAIR6AS1t+iwE0mMvoymYM.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '' (RSA) to the list of known hosts.
admin@ Permission denied (publickey).

Universal Plug n Play (UPnP) is running on TCP/5000. Perhaps this is for future application? If you think you know what this is for outside of the standard UPnP application, let me know. Seems weird to have it when there’s another layer of CG-NAT beyond it. Also curious about TCP/9000 and TCP/9001. The most common uses for these are PHP-FPM and The Onion Router (ToR) but I doubt that’s what they’re for. If anyone has ideas I’m all ears.

I wanted to see what happens when you bypass the Starlink router and plug the dish right into a different device instead. It turns out, this works! You get an RFC6598 IP address which is what you do for CG-NAT. Makes sense when you are working with very little IPv4 space and you need to conserve as much as you can.

While in the SD-WAN Platform, let’s check out how well it thinks Starlink would do for real time applications like voice compared to my Spectrum circuit.

Hmmm… have a little ways to go there it seems. There are a lot of instances of packet loss, jitter, high latency and just plain no connection.

How about IPv6? Turns out, not ready yet.

One thing I really love is the support section. It has some really great insights and commonly asked questions like this one in it.

So that’s it for now! As mentioned before, an amazingly simple setup experience and it really is a remarkable offering considering they are blazing new territory here. I think the services will only improve over time and you will see greater stability and performance with each software update/improvement SpaceX makes. That said, this is usable and much better than many of the alternatives those in the boonies suffer today. If you would like to read my thoughts on why I think LEO satellite internet access is more significant in rural areas than 5G, check that out here.

I’m going to write a follow up but wanted to get something out for folks to check out. Please do contact me for things you would like for me to try or tell you about! I would absolutely love to share my experiences and learn more by answering your questions.

Starlink vs. 5G

Starlink > 5G

There has been a lot of interesting developments in the mobile/wireless connectivity world as of late. Despite being told for many years 5G will change our lives (seriously, for a really long time now), as it finally comes to market it seems there are other technologies that might steal a little bit of that 5G thunder. The more I read about SpaceX’s Starlink or the other low earth orbit (LEO) satellite services like OneWeb, Telesat & Amazon, the more they seem to have the potential to make a bigger impact than 5G. Low earth orbit satellite connectivity solutions appear to be solving what seem like more pressing remote and limited connectivity problems. Don’t get me wrong, 5G will likely be a great incremental step forward in the places where we already have 4G/LTE connectivity today but it really won’t do much to help those who are so far off the beaten path that they don’t have good access. Being subscribed to the Reddit group /r/starlink, you see some pretty amazing reviews from people who up until now, haven’t had many options for connectivity. In particular, if you live in remote parts of the world which Starlink is currently servicing, there are now some pretty amazing connectivity you never had before.

Living and Working in the Boonies

There are a lot of niceities to living in very rural areas for those that enjoy the country life. Large plots of land, lots of privacy, no hustle, nor bustle. That said, ease of access to high speed internet access is not a benefit you often enjoy in the sticks. If you are fortunate enough to have high speed access in very rural areas, options are limited to one or two overpriced providers that have a monopoly. These providers also a lot of infrastructure costs to cover for relative few addressable customers which goes for remote residential and business customers as well. There usually is little in the way of good wireless 4G/LTE coverage for the same reasons as the wireline guys because it just doesn’t pay to put the kind of dollars into building the infrastructure and backhauling fiber from towers which will reach only a handful homes and businesses. With that, there are huge swaths of extremely rural areas with little to no access at all that would potentially never make financial sense to reach with terrestrial options. For some, getting away from Internet access may be by design but for others it’s never ending disappointment of crappy, overpriced connectivity options. Low earth orbit satellite services can cover these areas very well and provide connectivity to areas which would never be on terrestrial wireline or wireless carriers otherwise. There are countless people and organizations that can finally know the convenience of effective, low latency (~50ms) & broadband access at 50-200Mbps speeds in these areas. But are the speeds and performance of low earth orbit access enough compared to the speeds of 5G?

How Much Bandwidth is Enough?

Maybe I’m getting too old to carry a geek card but but I often wonder, how fast does Internet access really need to be? Sure, faster is always better but how much bandwidth does one need before there is no real discernible difference between a few hundred megabits per second and getting up into multi-gigabits per second? It’s kind of like going from HD resolutions at 1080P up to Ultra HD resolutions at 4K or even 8K. I personally can’t tell the difference on the size TVs that I buy, which are around 50” or so. Another analogy might be in computing such as the difference between a 3.3Ghz six core or 3.8Ghz eight core processor. I understand there’s a difference but do the applications I use day in, day out really show a significant performance increase? Will multi-gigabit speeds really make a noticeable difference for me or the average user? For the enthusiast and those living on the cutting edge of technology, sure, they’ll bust out their benchmarking tools to compare and find ways to use all of that throughput. Most users like myself are perfectly content with around 100-250Mbps of bandwidth.

What Connectivity Problems Need Solving?

Once Starlink and other low earth orbit satellite services like it really start chugging, they will solve connectivity issues for many of the underserved. Contrast that with 5G as an incremental performance increase for those who already have 4G/LTE access today, which is great but in my mind, less significant. Connecting the unconnected or ”underconnected” with more bandwidth is far more interesting than just souping up existing connectivity that is pretty darn good as it is. I am certainly long on the promise low earth orbit access brings for global connectivity landscape and think this will be a hugely disruptive. I only wish I could buy stock in Space X to support and share in the success of their mission!

What do you think?

Is your Internet connectivity REALLY redundant?


The CenturyLink/Level3 internet outage on August 30th, 2020 got a lot of network engineers thinking about internet reachability and the ways things can go wrong. The way this particular failure played out was unique and definitely gave us all a lot to consider in the way of oddball failure scenarios. Problems started for CenturyLink/Level3 when a BGP Flowspec announcement came from a datacenter in Mississauga, Ontario in Canada. Flowspec, a security mechanism within BGP, is commonly used to filter large volumetric distributed denial of service (DDoS) attacks within a network. The root cause of this particular issue appears to be operator error by way of a CenturyLink engineer being allowed to put a wildcard entry into a Flowspec command to block such an attack. This misformatted entry caused many more IP addresses than intended to be filtered, wreaking havoc on the CenturyLink/Level3 backbone within Autonomous System (AS) 3356. BGP sessions were torn down by the rule because of filtering across the backbone causing instability and reachability issues throughout.

One very interesting bit about how things failed was what happened when other networks tried to shutdown their BGP sessions to AS 3356. CenturyLink/Level3 didn’t stop propagating prefixes/IP address blocks even after the BGP sessions were shut down. This made the BGP speakers still connected to AS 3356 think it was a valid path to reach said prefixes/IP addresses but it was not any longer. This traffic was then “blackholed” within the CenturyLink/Level3 backbone because there was no longer an exit point to reach the IP addresses. So not only could you not use the backbone during the disruption, the failure actually could have prevented those who proactively disconnected from CenturyLink/Level3 to be able to utilize alternate paths they might be connected to.

So a question comes to mind of many network engineers examining the post-mortem of this event: How can I can I make sure my network is not affected if this happens again? There are a few things that come to mind as items to take into consideration:

SD-WAN – Now mainstream and very mature, SD-WAN is a fantastic way to overcome connectivity issues over the Internet. Because probes are sent periodically to measure path performance, the right SD-WAN solution could route around performance problems on a network. An SD-WAN overlay alone can’t resolve every issue but combined with some of the other recommendations here, certainly gives you greater resilience.

Autonomous System Diversity – When designing internet connectivity resilience, the goal is to make the links you have as independent from one another as you can. The autonomous system paths of the providers you select is important to examine to be sure they do not depend on one another for transit. A great tool to assist with this is CAIDA’s ASRank which is helpful to to see ASN relationships with one another. Take a look at the ASN of the providers you are considering to see their relationship to one another. In particular, you likely want to avoid the two ASes having a “customer” or “provider” relationship. Ideally, you’ll want them to be a “peer”. Unfortunately that doesn’t 100% guarantee you won’t be affected by something like what happened on August 30th to AS3356 basically still advertising and blackholing but it will get you about as close as you can get to the ASNs not having inter-dependence on one another and have a better chance of survivability.

Three Connections or More – Many with redundant Internet connections assume two connections are enough. I would contend that having a third connection, even if it’s a backup only connection via 4G/5G over a wireless carrier, can save your bacon if the other two carriers are affected by the same outage.

IXPs, CXPs and Cloud Direct Connections – You may want to consider peering into one of the following:

  • Internet Exchange Point (IXP) – You’ll find IXPs all over the world as a means to inexpensively peer networks directly in a multilateral or bilateral peering arrangement. With multilateral peering, you connect to a route server with one BGP peering session then send and receive all routes with anyone connected to the route server. Bilateral peering is a direct BGP peering relationship with another entity on the exchange. These allow a network to directly connect to regional network connections without the need for transit saving money, latency and improving overall performance. Quick plug: I work with the Ohio IX so if you’re in Ohio, I highly recommend checking them out.
  • Cloud Exchange Points (CXP) or Direct Cloud Connections – As the public cloud becomes more important to IT infrastructures, finding a way to stay directly connected to these resources becomes critical. Like connecting to an IXP, connecting to a CXP or Direct Cloud Connection to get to key cloud providers is another opportunity to not just improve redundancy but performance as well.

In closing, it’s difficult to plan for every type of network failure that can occur. This most recent CenturyLink/Level3 outage was one for the books, that’s for sure. All we can do as network engineers is learn from it and strive to build better networks from the lessons we take away.

Thanks for reading! If you’re an Ohio network engineer, be sure to check out a couple of organizations I’m involved with: (OH)NUG and Ohio IX. I might be a little biased but feel they are great resources right in our backyard!