Starlink: First Impressions

Starlink

So my Starlink kit is finally here! A number of folks have asked for first impressions so I’m going to break it down. Long story short, it’s a breeze to setup but operationally, definitely a beta service. Let’s explore.

The Unboxing

What first struck me was how efficiently everything is packaged. The first thing you see is the 3 panel pictogram of how to set this all up right on top and you know what, it pretty much was this easy. When you lift this page and the top molded plastic panel that holds everything in place up, you get to the goods. The form fitting molded plastic on the top and bottom holds the kit in super snug, it’s really well executed. In the box:

  • Starlink Dish with attached mast
  • Ground base to attached to dish mast
  • Starlink Power over Ethernet (PoE) Injector – Model UTP-201S – Output towards dish maxes out at 90W (x2), output towards router maxes out at 17W. Total wattage this guy can produce is 180W.
  • Starlink Router – Model UTR-201 – PoE input 10W – Has built in 802.11a/b/c/g/ac Wifi over 2.4Ghz & 5Ghz and “AUX” 10/100/1000 Ethernet port.
  • Pre-connected cables, 100 foot black cable for dish to PoE injector, 6 foot white for PoE injector to router. I did not plug these in, they were packaged plugged in.

The Setup

Everything is pre-cabled so assembly really comes down to:

  • Snap the dish and it’s attached mast into the base
  • Plug PoE brick into the wall
  • Plug the pre-terminated and weather proofed black cable from the dish into it’s black color coded port on the PoE brick
  • Plug the white cable already hanging off the included router into it’s color coded port on the PoE brick.

From a physical standpoint, that’s all you really have to do! A lot of focus was obviously spent on making this very easy to deploy and I have to say, mission accomplished. I literally got everything up and running within 10 minutes. After you plug the dish in, it points straight up at the sky and starts to tune it self to receive the strongest signal possible with it’s built in motors. If you want to see these motors and the guts of the dish, check out engineer Ken Keiter’s tear down. It’s quite impressive, I highly recommend checking it out. The dish iterated through a few positions and I wish I would have gotten video (might try again soon) but it eventually settled on a position somewhere in the sky NNE of my house.

First thing to do after plugging everything in is to get the Starlink app on iOS or Android. All configuration, control and documentation is really within this app so it’s definitely a requirement. This process is relatively straightforward and is a lot like any other consumer IoT devices you may have picked up recently.

The Starlink UTR-201 router comes with a default SSID which is printed right on it by the “AUX” port on the back of the unit.

The iOS/Android app connects to it over Wifi and adopts the router so now you can adjust some basic settings via the app. Not really much you can configure there other than the wireless SSID, more on that later.

The Operation

Here are some notes on what things look like after we get it all plugged in and up and running. I have to say it’s definitely “better than nothing” as they state. That said, there is room for improvement.

  • Once connecting to the Starlink router via Wifi or wired via the AUX port, you will be DHCP’d a 192.168.1.x/24 address. This is not optional and there is no way to reconfigure different addressing or other DHCP options that I can find.
  • There is no management interface to the router and the options are very limited. There is a rather nice statistics dashboard you can see through the app or surfing in your browser to 192.168.100.1 when connected to it.
  • Your DNS server will be the router at 192.168.1.1 and the search domain is just “lan”.
  • There is no configuring port translations but the router is running Universal Plug n Play (UPnP, see below in the next section) so maybe there will be plans for that later?
  • The WAN interface on the router is behind Carrier Grade Network Address Translation (CG-NAT). More on this later, but this will make port forwarding impossible and having a public IP address for specific applications (things like old school IPSEC VPNs or accessing your own server directly) is not currently possible.

How’s the latency? Pretty good actually.

jg-mbp:~ jason$ ping 4.2.2.2
PING 4.2.2.2 (4.2.2.2): 56 data bytes
64 bytes from 4.2.2.2: icmp_seq=0 ttl=58 time=37.909 ms
64 bytes from 4.2.2.2: icmp_seq=1 ttl=58 time=43.383 ms
64 bytes from 4.2.2.2: icmp_seq=2 ttl=58 time=40.946 ms
64 bytes from 4.2.2.2: icmp_seq=3 ttl=58 time=39.343 ms
64 bytes from 4.2.2.2: icmp_seq=4 ttl=58 time=37.811 ms
^C
--- 4.2.2.2 ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 37.811/39.878/43.383/2.091 ms

This in the same neighborhood as RTTs over my Spectrum connection which is impressive! This is my Spectrum RTT to the same address.

jason@rtr01-jghome:~$ ping 4.2.2.2
PING 4.2.2.2 (4.2.2.2) 56(84) bytes of data.
64 bytes from 4.2.2.2: icmp_req=1 ttl=53 time=34.4 ms
64 bytes from 4.2.2.2: icmp_req=2 ttl=53 time=32.2 ms
64 bytes from 4.2.2.2: icmp_req=3 ttl=53 time=31.0 ms
64 bytes from 4.2.2.2: icmp_req=4 ttl=53 time=29.5 ms
64 bytes from 4.2.2.2: icmp_req=5 ttl=53 time=34.3 ms
^C
--- 4.2.2.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 29.513/32.314/34.454/1.910 ms

Next question, how much bandwidth are we getting? This is typically in the neighborhood of around 70Mbps down / 10Mbps up. It’s good, but not great. I was most surprised at the upload bandwidth, I wasn’t expecting to get this much.

Now as far as stability goes, that’s all over the board. Here’s a My Traceroute (MTR) which trace routes the path then pings the hops repeatedly. I let it cycle through 100 times here.

Oof. That’s not pretty. Standard deviation is up there and we are getting upwards of 300ms RTTs right out of the gate. More detail will be below in the next section after I plug it into my VMware SD-WAN appliance.

The Geekier Stuff

The previous sections were the basics that most people want to see. This section will be more of the fun details I observed while playing around.

One thing that I thought was interesting was the router’s hostname resolved via DNS off itself.

jg-mbp:~ jason$ host 192.168.1.1
1.1.168.192.in-addr.arpa domain name pointer OpenWrt.lan.

So it looks like it’s based on OpenWRT. To be honest, this is not uncommon and I know of many other commercial products based on this as well.

I tried to see if there is a web management interface on the router but no such luck. Here’s what a port scan looks like.

Starting Nmap 7.91 ( https://nmap.org ) at 2021-02-28 11:33 EST
Nmap scan report for 192.168.1.1
Host is up (0.24s latency).
Not shown: 994 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
53/tcp   open  domain
80/tcp   open  http
5000/tcp open  upnp
9000/tcp open  cslistener
9001/tcp open  tor-orport

When you go to port 80 on it, it just redirects you to https://www.starlink.com. Boring!

The router is listening for DNS queries on port 53 and answering them pretty quickly. It appears to be proxying and caching DNS entries which certainly helps speed things up. It’s all about optimizing performance where you can when delivering internet from space and I think this was a smart way to go. Here’s a dig query against the router for a cached entry vs against an internet name server. 2ms vs 63ms is a big improvement!

jg-mbp:~ jason$ dig google.com @192.168.1.1

; <<>> DiG 9.10.6 <<>> google.com @192.168.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9561
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		212	IN	A	142.250.64.110

;; Query time: 2 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Mon Mar 01 20:40:03 EST 2021
;; MSG SIZE  rcvd: 55

jg-mbp:~ jason$ dig google.com @8.8.8.8

; <<>> DiG 9.10.6 <<>> google.com @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41035
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com.			IN	A

;; ANSWER SECTION:
google.com.		117	IN	A	142.250.64.110

;; Query time: 63 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mon Mar 01 20:40:10 EST 2021
;; MSG SIZE  rcvd: 55

TCP/22 aka SSH is open! But good luck getting in there. Their SSH server uses key based instead of user based authentication which I have to say is refreshing! Definitely a step in the right direction when it comes to IoT device security.

jg-mbp:~ jason$ ssh admin@192.168.1.1
The authenticity of host '192.168.1.1 (192.168.1.1)' can't be established.
RSA key fingerprint is SHA256:owxzwYXb/xsrqqDmR1YkIaAIR6AS1t+iwE0mMvoymYM.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.1.1' (RSA) to the list of known hosts.
admin@192.168.1.1: Permission denied (publickey).

Universal Plug n Play (UPnP) is running on TCP/5000. Perhaps this is for future application? If you think you know what this is for outside of the standard UPnP application, let me know. Seems weird to have it when there’s another layer of CG-NAT beyond it. Also curious about TCP/9000 and TCP/9001. The most common uses for these are PHP-FPM and The Onion Router (ToR) but I doubt that’s what they’re for. If anyone has ideas I’m all ears.

I wanted to see what happens when you bypass the Starlink router and plug the dish right into a different device instead. It turns out, this works! You get an RFC6598 IP address which is what you do for CG-NAT. Makes sense when you are working with very little IPv4 space and you need to conserve as much as you can.

While in the SD-WAN Platform, let’s check out how well it things Starlink would do for real time applications like voice compared to my Spectrum circuit.

Hmmm… have a little ways to go there it seems. There are a lot of instances of packet loss, jitter, high latency and just plain no connection.

How about IPv6? Turns out, not ready yet.

One thing I really love is the support section. It has some really great insights and commonly asked questions like this one in it.

So that’s it for now! As mentioned before, what a setup experience and it really is a remarkable offering considering they are blazing new territory here. I think the services will only improve over time and you will see greater stability and performance with each software update/improvement SpaceX makes. That said, this is perfectly usable and much better than many of the alternatives those in the boonies suffer today.

I’m going to write a follow up but wanted to get something out for folks to check out. Please do contact me for things you would like for me to try or tell you about! I would absolutely love to share my experiences and learn more by answering your questions.

Starlink Experiment Underway!

Ohio Starlink beta signups, check your inbox! I signed up for it last spring and have been anxiously anticipating the email inviting me in. Well today is the day and shortly, I’ll be enjoying my internet access from space. I’m super excited to test this out and add more resilience and capacity to my access. I’ll definitely post updates detailing the experiment.

The signup was very straightforward, an email arrived at 4:50p today inviting me to check availability. I clicked on the link, put in my email and physical address that confirmed availability. I was then taken to a page to put in my personal info, a credit card number and clicked submit. A confirmation email arrived for my payment and had a link to sign into the account. I’m now able to log in to track the status of my order.

Looks like it’s going to be about 2-4 weeks before the kit arrives. I’ve downloaded the app which you can use to check for obstructions in the place where you would like to install the dish but it’s dark now so I’ll check that out tomorrow. There is also a step by step guided installation portion of the app that I’ll hold off on until the dish comes.

So I guess, now we wait! I’ll have more updates as they become available. Definitely excited to test out what I really feel will be the future of connectivity!

SDN’s Promise Lives On

I’m not sure if anyone else remembers the early days of OpenFlow and Software Defined Networking (SDN), but it was going to change networking for good. The idea was being able to embed programmable forwarding and filtering policy directly into your switches. You would then use a controller which spoke to the switches via the OpenFlow protocol to distribute network policy like you would find in firewalls & load balancers, right inside your switch. The problem was that OpenFlow had scalability issues, was tough to integrate with switch pipelines and just never saw the enterprise adoption that many expected. 10 years after the first drafts of OpenFlow, we are seeing some of the promise of SDN & OpenFlow’s key ideals back again for consumption in different forms.

Out with the DumbNICs, In with the SmartNICs

Some big names in the networking such as Broadcom, Mellanox & Xilinix are beginning to tout SmartNIC offerings which they have created at the behest of the cloud scaler crowd the likes of Amazon, Microsoft and Google. These organizations are looking ways to offload the network processing away from the general purpose CPUs which power their cloud offerings over to specialized SmartNICs that take it on. A nice side effect of putting the packet mojo into the SmartNIC is forwarding and policy capabilities outside of what a standard NIC can accomplish using Field Programmable Gate Arrays (FPGA) or SOC (system on a chip) to provide extended functionality. This makes for some interesting distributed policy and monitoring capabilities such as telemetry, encryption, filtering, stateful IPS/IDS, load balancing and compression right at the host side edge of the network. This is tremendously powerful in that it presents a lot of options for large operators by distributing and accelerating workloads while preserving the compute infrastructure for what it should be doing: computing.

Comparison of NIC capabilities care of Mellanox

A “SmartToR” Edge

Late last year, Broadcom announced their Trident based SmartToR offerings to take similar functionality you would find in SmartNICs to the Top of Rack (ToR) switch, the network side edge of the datacenter. This makes possible some of those same features in the SmartNIC and pushes them down to the network edge before the packets get to the hosts. Examples of potential applications could be for traffic you wish to manipulate before it actually gets to the hosts. So you can introduce load balancers to distribute inbound sessions, firewalls to stop DDoS from melting the hosts, Network Address Translation (NAT) to translate public to private addresses, penultimate decryption of traffic and all with switch Application Specific Integrated Circuits ASICs for blazing forwarding performance. Leveraging really fast switch ASICs to analyze and control traffic gives you orders of magnitude greater scale. Analysts believe that the SmartToR concept will be more suitable for enterprise versus the SmartNIC approach because of the preference for a “bare metal” appliance in enterprise environments.

In a forwarding battle of CPU vs SmartToR, I’m betting on the SmartTor

What’s the deal with P4?

So after OpenFlow, a consortium of network luminaries including some of the smart people responsible for OpenFlow decided to try something new. Programming Protocol-independent Packet Processors (or P4 for short) was created with a different type of network architecture in mind. Instead of speaking to fixed-function switches who have locked in functionality baked right into their ASICs which OpenFlow leverages, P4 was meant to be a programming language used on a programmable chip (aka PISA or Protocol Independent Switch Architecture). The idea was instead of letting the switch chip makers dictate what features and functions were available, one could potentially write their own rules right into the switch for operation. Want write a custom IPv4 pipeline to support a specific need in your backbone? Go for it. Want to write your own Internet Protocol, IPv[Your Name Here]? Knock yourself out. It was a fully extensible and programmable chip so the sky was the limit. Though powerful, this is inaccessible for the average organization and wielded more deftly in the hands of cloud scale companies and network vendors. P4 is being leveraged to write some cool network applications, but has limited application in the enterprise.

So is SDN Coming Back?

SDN never really went away. Many cloud-scalers and academic networks continued on with their SDN efforts with and without OpenFlow long after the rest of the industry. The “one size fits all” approach of OpenFlow just was not a fit for most of the market but many of the basic tenets appear to live on in different forms. From the looks of trends like SmartNICs, P4 and switches the likes of Broadcom’s SmartTor, the idea lives on that you can embed many of today’s disparate edge network functions right into the network itself. Taking what has traditionally been separate network appliances scattered throughout the network and embedding them inline presents many advantages with regard to capacity, visibility and control. As with anything, it takes a few tries to get things right but the promise of a fully programmable network within which you can directly embed key network functions is too great to go away.

Let me know what you think, feel free to comment or engage on LinkedIn or Twitter. To see what other things I’m up to, check out my Now page. Thanks for reading!

Starlink vs. 5G

Starlink > 5G

There has been a lot of interesting developments in the mobile/wireless connectivity world as of late. Despite being told for many years 5G will change our lives (seriously, for a really long time now), as it finally comes to market it seems there are other technologies that might steal a little bit of that 5G thunder. The more I read about SpaceX’s Starlink or the other low earth orbit (LEO) satellite services like OneWeb, Telesat & Amazon, the more they seem to have the potential to make a bigger impact than 5G. Low earth orbit satellite connectivity solutions appear to be solving what seem like more pressing remote and limited connectivity problems. Don’t get me wrong, 5G will likely be a great incremental step forward in the places where we already have 4G/LTE connectivity today but it really won’t do much to help those who are so far off the beaten path that they don’t have good access. Being subscribed to the Reddit group /r/starlink, you see some pretty amazing reviews from people who up until now, haven’t had many options for connectivity. In particular, if you live in remote parts of the world which Starlink is currently servicing, there are now some pretty amazing connectivity you never had before.

Living and Working in the Boonies

There are a lot of niceities to living in very rural areas for those that enjoy the country life. Large plots of land, lots of privacy, no hustle, nor bustle. That said, ease of access to high speed internet access is not a benefit you often enjoy in the sticks. If you are fortunate enough to have high speed access in very rural areas, options are limited to one or two overpriced providers that have a monopoly. These providers also a lot of infrastructure costs to cover for relative few addressable customers which goes for remote residential and business customers as well. There usually is little in the way of good wireless 4G/LTE coverage for the same reasons as the wireline guys because it just doesn’t pay to put the kind of dollars into building the infrastructure and backhauling fiber from towers which will reach only a handful homes and businesses. With that, there are huge swaths of extremely rural areas with little to no access at all that would potentially never make financial sense to reach with terrestrial options. For some, getting away from Internet access may be by design but for others it’s never ending disappointment of crappy, overpriced connectivity options. Low earth orbit satellite services can cover these areas very well and provide connectivity to areas which would never be on terrestrial wireline or wireless carriers otherwise. There are countless people and organizations that can finally know the convenience of effective, low latency (~50ms) & broadband access at 50-200Mbps speeds in these areas. But are the speeds and performance of low earth orbit access enough compared to the speeds of 5G?

How Much Bandwidth is Enough?

Maybe I’m getting too old to carry a geek card but but I often wonder, how fast does Internet access really need to be? Sure, faster is always better but how much bandwidth does one need before there is no real discernible difference between a few hundred megabits per second and getting up into multi-gigabits per second? It’s kind of like going from HD resolutions at 1080P up to Ultra HD resolutions at 4K or even 8K. I personally can’t tell the difference on the size TVs that I buy, which are around 50” or so. Another analogy might be in computing such as the difference between a 3.3Ghz six core or 3.8Ghz eight core processor. I understand there’s a difference but do the applications I use day in, day out really show a significant performance increase? Will multi-gigabit speeds really make a noticeable difference for me or the average user? For the enthusiast and those living on the cutting edge of technology, sure, they’ll bust out their benchmarking tools to compare and find ways to use all of that throughput. Most users like myself are perfectly content with around 100-250Mbps of bandwidth.

What Connectivity Problems Need Solving?

Once Starlink and other low earth orbit satellite services like it really start chugging, they will solve connectivity issues for many of the underserved. Contrast that with 5G as an incremental performance increase for those who already have 4G/LTE access today, which is great but in my mind, less significant. Connecting the unconnected or ”underconnected” with more bandwidth is far more interesting than just souping up existing connectivity that is pretty darn good as it is. I am certainly long on the promise low earth orbit access brings for global connectivity landscape and think this will be a hugely disruptive. I only wish I could buy stock in Space X to support and share in the success of their mission!

What do you think?

Is your Internet connectivity REALLY redundant?

Outages

The CenturyLink/Level3 internet outage on August 30th, 2020 got a lot of network engineers thinking about internet reachability and the ways things can go wrong. The way this particular failure played out was unique and definitely gave us all a lot to consider in the way of oddball failure scenarios. Problems started for CenturyLink/Level3 when a BGP Flowspec announcement came from a datacenter in Mississauga, Ontario in Canada. Flowspec, a security mechanism within BGP, is commonly used to filter large volumetric distributed denial of service (DDoS) attacks within a network. The root cause of this particular issue appears to be operator error by way of a CenturyLink engineer being allowed to put a wildcard entry into a Flowspec command to block such an attack. This misformatted entry caused many more IP addresses than intended to be filtered, wreaking havoc on the CenturyLink/Level3 backbone within Autonomous System (AS) 3356. BGP sessions were torn down by the rule because of filtering across the backbone causing instability and reachability issues throughout.

One very interesting bit about how things failed was what happened when other networks tried to shutdown their BGP sessions to AS 3356. CenturyLink/Level3 didn’t stop propagating prefixes/IP address blocks even after the BGP sessions were shut down. This made the BGP speakers still connected to AS 3356 think it was a valid path to reach said prefixes/IP addresses but it was not any longer. This traffic was then “blackholed” within the CenturyLink/Level3 backbone because there was no longer an exit point to reach the IP addresses. So not only could you not use the backbone during the disruption, the failure actually could have prevented those who proactively disconnected from CenturyLink/Level3 to be able to utilize alternate paths they might be connected to.

So a question comes to mind of many network engineers examining the post-mortem of this event: How can I can I make sure my network is not affected if this happens again? There are a few things that come to mind as items to take into consideration:

SD-WAN – Now mainstream and very mature, SD-WAN is a fantastic way to overcome connectivity issues over the Internet. Because probes are sent periodically to measure path performance, the right SD-WAN solution could route around performance problems on a network. An SD-WAN overlay alone can’t resolve every issue but combined with some of the other recommendations here, certainly gives you greater resilience.

Autonomous System Diversity – When designing internet connectivity resilience, the goal is to make the links you have as independent from one another as you can. The autonomous system paths of the providers you select is important to examine to be sure they do not depend on one another for transit. A great tool to assist with this is CAIDA’s ASRank which is helpful to to see ASN relationships with one another. Take a look at the ASN of the providers you are considering to see their relationship to one another. In particular, you likely want to avoid the two ASes having a “customer” or “provider” relationship. Ideally, you’ll want them to be a “peer”. Unfortunately that doesn’t 100% guarantee you won’t be affected by something like what happened on August 30th to AS3356 basically still advertising and blackholing but it will get you about as close as you can get to the ASNs not having inter-dependence on one another and have a better chance of survivability.

Three Connections or More – Many with redundant Internet connections assume two connections are enough. I would contend that having a third connection, even if it’s a backup only connection via 4G/5G over a wireless carrier, can save your bacon if the other two carriers are affected by the same outage.

IXPs, CXPs and Cloud Direct Connections – You may want to consider peering into one of the following:

  • Internet Exchange Point (IXP) – You’ll find IXPs all over the world as a means to inexpensively peer networks directly in a multilateral or bilateral peering arrangement. With multilateral peering, you connect to a route server with one BGP peering session then send and receive all routes with anyone connected to the route server. Bilateral peering is a direct BGP peering relationship with another entity on the exchange. These allow a network to directly connect to regional network connections without the need for transit saving money, latency and improving overall performance. Quick plug: I work with the Ohio IX so if you’re in Ohio, I highly recommend checking them out.
  • Cloud Exchange Points (CXP) or Direct Cloud Connections – As the public cloud becomes more important to IT infrastructures, finding a way to stay directly connected to these resources becomes critical. Like connecting to an IXP, connecting to a CXP or Direct Cloud Connection to get to key cloud providers is another opportunity to not just improve redundancy but performance as well.

In closing, it’s difficult to plan for every type of network failure that can occur. This most recent CenturyLink/Level3 outage was one for the books, that’s for sure. All we can do as network engineers is learn from it and strive to build better networks from the lessons we take away.

Thanks for reading! If you’re an Ohio network engineer, be sure to check out a couple of organizations I’m involved with: (OH)NUG and Ohio IX. I might be a little biased but feel they are great resources right in our backyard!

Ohio Internet Exchange (IX)

In March 2020, I volunteered to become a part of the Ohio Internet Exchange aka Ohio IX Technical Steering Committee. If you haven’t heard of the Ohio IX before it is an Internet exchange (IX), also known as an Internet exchange point (IXP) or peering point. IX/IXPs allow Internet transport carriers, Internet service providers (ISPs), mobile and content providers, and other organizations with great connectivity needs to exchange Internet traffic inexpensively through a common switch fabric usually on a settlement-free basis (i.e. no usage charges). Ohio IX reduces the portion of an organization’s Internet transit traffic that must be delivered via their upstream transit providers, thereby reducing the average per-bit delivery cost of their service. The increased number of paths learned through Ohio IX improves routing efficiency and fault-tolerance. By connecting to Ohio IX, members can peer with the route servers provided by Ohio IX, or with any other member, provided the members in question have reached a bilateral peering agreement. There are three types of connectivity options into or through the Ohio IX Fabric:

Multilateral Peering: Traffic exchanged directly between members over shared exchange fabric utilizing Ohio IX route servers

Bilateral Peering: Traffic exchanged directly between two members of the exchange over the shared exchange fabric

Direct Access: Connect directly into content you specify.

There are two levels of membership to the Ohio IX, associate ($500/yr) and senior ($1000/yr). Senior membership requires your own Autonomous System Number (ASN) and requires connecting to the exchange with either a 1G or 10G port. 40G and 100G ports are available as well but additional charges apply. Associate members can participate by connecting to the exchange but are not required to and they have limited voting rights.

The work I’ve been doing has been pretty interesting! I have been getting my hands dirty standing up some Linux hosts for management and monitoring, not to mention working with other technical peers to keep the network running. I am new so have a lot to learn and more work to do in order to feel like I’m pulling my weight, but it’s been a great experience thus far!

If you are interested in learning more about the Ohio IX, reach out to me on LinkedIn or Twitter (links for both at the right) or email info@ohioix.org.

I hope to see you as a new member soon!

VXLAN + EVPN in Campus Switching?

I recently received some Arista 720XP PoE Campus Switches a key component of Arista’s Cognitive Campus efforts. While reviewing the datasheet, noticed something I found interesting: VXLAN + EVPN support.

Detail of VXLAN + EVPN support from Arista 720XP Datasheet

For those unfamiliar with VXLAN & EVPN here are some resources to check out.

https://en.wikipedia.org/wiki/Virtual_Extensible_LAN

https://blog.ipspace.net/2018/05/what-is-evpn.html

TLDR; VXLAN is layer 2 Ethernet transport over a layer 3 tunneling encapsulation while EVPN is a BGP based control plane for layer 2 and layer 3 hosts to signal this transport.

It is interesting to have these features in a campus switching platform and got me thinking about the evolution of the campus. Many large scale campus environments that I have worked with over the years have a mix of Layer 2 and Layer 3 islands which can be complex and difficult to deploy, manage and integrate. VXLAN + EVPN can potentially make operating a large, complex network much easier by giving one the capability to build an Layer 3 core that can use VXLAN as a transport for Layer 2 across a campus. Couple with that with maturing network controllers and intent based systems then the potential is there to build seamless, scalable infrastructures. For some of the huge, technology debt laden campuses out there, relief may be coming to reduce the difficulties of managing these systems.

Though use of VXLAN + EVPN in the campus is not new with vendors like Cisco, Extreme and Juniper offering solutions in the past few years as well, uptake seems to be slow. I often find many network engineers are rarely familiar with VXLAN much less EVPN so I think it’s going to be a while before they really get traction. I also think the software management platforms will be key to obfuscate the complexity and it will take a while for those to mature as well.

I like where this is headed and can see a lot of value in it for network practitioners. That said, one should always weigh the trade offs and validate that the complexity is warranted. Introducing technologies like EVPN to an environment is not trivial and it is key to understand how to operationally handle it including ongoing management and troubleshooting of these technologies. As always, be sure to lab and really dig into how things work before deploying.

Maintenance Windows: Keep it short or use it all?

So I heard about an interesting encounter which started a debate with our team. An engineer on our team was working through a maintenance window and was prepared for the event with all of his configs and the order of operation to complete the maintenance as quickly as possible. This was all shared with the customer. Even though there was an hour set aside for the window with change control, our engineer wanted to use as little of that time as possible for downtime and stick to the change as specified, making the impact to users minimal. The engineers on the customer end said “We have an hour, there’s no rush.” so took their time moving cables without any prior preparation or planning to do so quickly. The customer also seemingly had no urgency to make the configuration changes quickly to keep downtime to a minimum, assuming they had plenty of time. This experience started a conversation about this and our team came to the conclusion that this perspective was pretty common.

There is something to be said for being methodical and carefully working through the maintenance. That said, I would contend that a sense of urgency to complete the maintenance quickly reduces impact to users and gives more time to troubleshoot if any issues arise. I personally make it a point to get the maintenance done quickly with as little downtime as possible but it seems that some choose to use the all of the allotted time.

If it was your maintenance window to run with, how would you handle it? Get it done as quickly as possible with as little downtime as you can or take your time and use the whole time window available?

Five Use Cases for SD-WAN

A lot of folks I speak with about Software Defined Wide Area Networking (SD-WAN) are trying hard to understand how this rapidly emerging technology works and the places where it can fit with their clients or within their own network. As we acquire more experience with deployments inside many different business and network environments, the results that we discover are quite surprising. There are many applications where SD-WAN is an obvious fit but in some cases, the true value is not exactly what we were expecting. The following are some of the more prominent examples of reasons for SD-WAN we’ve been able to assist with to date:

  1. Voice Services Over the Internet – A lot of small to medium sized businesses have started utilizing voice services over commodity broadband connections with no Quality of Service (QoS) in place. Though most of the time this works adequately, there will be many instances of degradation in quality or dropped calls that can be frustrating. This has just been the reality of utilizing the public Internet for voice services… up until now. With SD-WAN, we’re able to prioritize voice traffic both inbound and outbound while leveraging multi-path technologies to “route around” carrier backbone problems. We’re able to do this with single, stand alone sites in addition to multiple locations.
  2. WAN Visibility and Management – Setting aside the benefits of multi-path link steering, bandwidth aggregation and QoS for a bit, many organizations have no usage breakdowns or application performance visibility in their network today. As a byproduct of the application steering and prioritization baked into most SD-WAN solutions, there is a great deal of reporting functionality available. So now when stakeholders of IT want to know what is happening at their remote locations, they have a graphical interface to see exactly what is happening.
  3. Configuration Uniformity and Standardization – Large organizations which have many sites or will soon have many sites at the hands of rapid growth can have a lot of hands in the IT group working on things. With this, lack of standardization becomes an issue as sites are configured and turned up if there is not a uniform configuration policy. With SD-WAN, attaining a high level of uniformity is simple using features like Zero Touch provisioning and Configuration Profiles to make sure that all sites are configured identically. This also helps greatly for change management if you want to make a configuration update to all of your locations. With this approach, you can update a configuration in one place and push it to all sites, instantaneously. This frees up engineers to solve larger problems facing the business rather than making a minor configuration change on dozens or hundreds of sites.
  4. Remote Diagnostics Capabilities – When there are issues at a remote location, it can often times be difficult to walk users through providing troubleshooting assistance or getting the right software and hardware onsite. With the built in tools into many SD-WAN solutions, the ability to perform packet captures, see network state and what the users see on the network, so that the time vetting issues on the network can be greatly reduced.
  5. MPLS / IP VPN Replacement – MPLS and other dedicated private network infrastructures have begun to outlive their usefulness with many organizations as critical workloads are moved to the cloud. Further, there is growing demand by companies to reduce cost of their expensive WANs that typically have no redundancy or application smarts built in. SD-WAN can easily leverage existing dedicated internet access (DIA) links and even inexpensive broadband connections to build an application aware, private network overlay that provides more applications control, redundancy and critical business application prioritization than traditional network designs.

These are just five examples of things we have been able to help with. We’re happily conducting Proof of Concept deployments for businesses to show the value of SD-WAN and finding new use cases all the time. We find ourselves working on long standing problems that have been occurring for years in traditional networks and within just a few hours of having SD-WAN appliances in the network, fixing them. Using this technology is some of the most rewarding work I’ve ever done as a network engineer. SD-WAN really is a game changer!

The Service Provider IGP Question: OSPF or Integrated IS-IS?

(Moved from my old blog, http://packetrancher.com, which I decided I didn’t have the time for so shuttered in 2011. This was one of the few blogs posts worth saving from it.)

I had a choice to make recently in the decision of which open standards based IGP Routing protocol (i.e. NOT EIGRP) to chose between, OSPF or Integrated IS-IS.  If you look out there on the Internets, you’ll find many, many different discussions about which one to go with.  There are a lot of engineers who think IS-IS is dead and that no one uses it anymore, often times confusing it with IGRP (which SHOULDN’T be used anymore).  That is far from the truth as most large networks have used IS-IS for years and many others switch to it all the time.

There are positives and negatives to both OSPF and IS-IS as you’d expect, but they are very similar protocols.  First, lets get a run down of some of the facets and features of each:

OSPF

  • Version 1 became RFC 1131 in October 1989
  • Uses Dijkstra’s Algorithm to determine shortest path
  • Distributes routing updates/information with LSA (Link State Advertisement)
  • Runs over Internet Protocol (IP)
  • Supports Non-Broadcast Multi-Access Networks (NBMA) and Point to Multi-Point (P2MP) in addition to Point to Point (P2P) and Broadcast
  • Partitioned into ‘Areas’ where Area 0 is the backbone that connects all other areas.
  • IPv6 support: Added with re-written version 3 of the protocol

Integrated IS-IS

  • Published as RFC 1195 in December 1990
  • Uses Dijkstra’s Algorithm to determine shortest path
  • Distributes routing updates/information with LSP (Link State Packet)
  • Runs over ConnectionLess Network Protocol (CLNP)
  • Unnumbered Broadcast in addition to Point to Point (P2P) and Broadcast. No NBMA or P2MP
  • Possible to be partitioned into ‘Levels’ where Level 2 is the backbone that interconnects all other Level 1 areas
  • IPv6 support:  Was added with a Type-Length-Value (TLV) addition to the protocol

As you can see, a lot of similarities.  In fact, when most network engineers who have experience in both are asked which they would recommend, they say it really comes down to preference because they are so similar.  Which protocol are your engineers accustomed to using and troubleshooting with?  That’s the one to go with.  I think it’s a little more involved than that, but from an network operations perspective I guess that could be a determining factor.

In evaluating my network to see which is going to be the best long term fit, I’ve come to the conclusion that Integrated IS-IS is the right choice for me.  There are a number of reasons why I came to this conclusion.

  1. Security – IS-IS runs in CLNP, not IP.  This means it is not as vulnerable to IP spoofing or other denial of service attacks that could affect OSPF.  Also if you run MPLS VPNs with OSPF in them, you’re less likely to have a NOC engineer accidentally add a customer to your core OSPF topology.
  2. Modularity – Equipment vendors can easily add newer protocols or features into IS-IS with the addition of new TLVs and sub-TLVs.  OSPF has historically required a re-write from the ground up to add support for protocols such as IPv6.
  3. Reputation – There is a very high opinion of IS-IS within engineering circles as being rock solid, quick converging and a very predictable IGP.  Granted, this is hearsay from my colleagues at other service providers, but I consider their opinion very valid.
  4. Simplification – I like the idea of keeping things simple so running IS-IS as both my IPv4 and IPv6 IGP is attractive.  In an OSPF world, that would require two routing instances, one for OSPFv2 routing IPv4 and the other for OSPFv3 routing IPv6.  I also think OSPF has too many knobs to play with that can let operators get a little carried away to make their networks more complicated than necessary.
  5. Vendor Focus – IS-IS is used predominantly and almost exclusively in the service provider space.  This creates a laser like focus of features and development on what service providers need.

So am I saying Integrated IS-IS is the best interior routing protocol ever invented that everyone should use?  By no means.  As with most comparisons of technologies so close to each other in operation, it comes down to the application of the technology.  Make sure you dig into the subject matter to get a good understanding so that you can really make a business case for your solution.  In decisions like the choice of an IGP, it’s something you are likely going to be stuck with for some time.  To swap it out for another protocol can be an absolute bitch to plan, test and change especially as the network grows.  It’s best to build it once so that it is stable and scales in YOUR environment.

Here’s a few great resources on the subject of ISIS vs. OSPF if you’re interested to read more: