Category: Uncategorized

minikube with WSL kubectl

Update: minikube 0.29.0 has been released and includes my merged PR so you can enable embedded certificates with minikube config set embed-certs true once and then just symlink your .kube/config file from your WSL home directory to the same file in your Windows home directory.


 

I recently blogged about how I work with minikube from the Windows Subsystem for Linux (WSL), describing some of the friction points and workarounds.

At the time I recommended using the Windows version of kubectl to avoid needing to translate the Windows file paths found in the .kube/config file to be usable with the Linux version of kubectl. Also, I hadn’t at that time encountered any use cases where using the Linux kubectl inside WSL would work better than the Windows version.

The first scenario most people would likely encounter with different or breaking behaviour would be passing absolute file paths, e.g. kubectl apply -f /home/jason/my.yaml would usually fail to locate the file with the Windows version. This is worked around most often by using paths relative to the working directory.

Another scenario where the Linux version of kubectl is preferred is the TTY support when running kubectl exec --tty mypod. This was the reason I personally decided to get the Linux version of kubectl working with minikube in my WSL environment.

My first approach was to copy the .kube/config file that is created in my Windows user profile directory during minikube start, modify the three certificate paths to be WSL-compatible paths, and save the result in my WSL home directory.

Later I realised (from the files generated by kubeadm init on production Kubernetes clusters) that the certificate entries in the config file don’t need to be paths, but can have the certificate content embedded as base64 blobs. Naturally I wrote a bash script that I could run from WSL to perform these steps for me each time my minikube IP address or certificates changed (which to be fair, isn’t often). The script will use the translated paths approach by default but if executed with the --embed parameter it will use the embedded certificates alternative.

After using this solution for a while I began wondering why minikube didn’t just generate a .kube/config file with embedded certificates so WSL support could be solved with a simple symlink instead of copying and rewriting the file each time it changed.

So I dived into the minikube source, and raised a pull request, and as at the time of writing this post, the PR has been merged into master and is just awaiting an official release. Once the new version of minikube is published (or if you’re keen to build it from source yourself), you will be able to execute minikube config set embed-certs true once and then minikube will always generate a .kube/config file with the certificates embedded as base64 blobs.

Then you can symlink your WSL ~/.kube/config file to your Windows %USERPROFILE%/.kube/config file and use either version of kubectl with no ongoing management.

PS: hat tip to Nuno do Carmo who also found a solution to embedding the certificates in the .kube/config file by using a pair of kubectl config ... --embed-certs commands. See the “Bonus 3: Do the same with Minikube” section of his extensive blog on WSLinux+K8S: The Interop Way”.

Sennheiser Presence headset review

Update: I’ve been using the Sennheiser Presence for several hours every weekday for a year now and I’m still very happy with it and recommend it to others.

headsetI work from home near full-time, and the rest of my team works remotely too, so I spend a decent amount of time on VoIP calls for scheduled meetings, paired debugging sessions, and general chit-chat.

For at least the last four years I’ve been using a Plantronics .Audio 615M USB headset that I received free at a conference back around 2009 and it has been working very well. Built-in Windows drivers recognise it as a USB communications headset so it becomes the default microphone and speaker for Slack Calls, Zoom, Google Hangouts, etc. I also love that it is a single-ear headset so I can still be aware of my surroundings while I’m wearing it, and the over-ear design (as opposed to in-ear) means it is comfortable to wear for extended periods.

What frustrated me about this headset was that I was tethered to my desk during a call, and I couldn’t use the same headset with my phone whilst on the go. The long cable also made it awkward to use when working from a café. So I started looking for an alternative.

I found TechRadar’s best Bluetooth headsets 2018 article and was drawn to the Sennheiser Presence UC at position 5 primarily because it “can connect to phone and laptop at the same time for easy switching”. The same article also called this headset “not the most comfortable” but I’ve had good experiences with other Sennheiser products, so I checked out the official product page where I discovered that an over-ear headband and charging stand was also available for this device. So I took a chance, and ordered the whole set.

I’ve been using the new Presence headset for over a week now and I am very happy with the product. It paired to my Android phone trivially, and the USB dongle for the laptop required no special drivers. Like the old Plantronics, the new Sennheiser headset also became the default device for my VoIP applications.

The USB dongle also has a light indicating that the headset is paired (dull blue), active (bright blue), or even disabled (red) because I toggled the “microphone mute” key on my laptop keyboard.

Battery life is supposedly 10 hours of talk time but with the charging stand in easy reach, I leave the headset there when I’m not on a call and I don’t need to think about the battery. The lack of cable also means I’m not getting it tangled in my chair wheels, and I can step away from my desk to refill my coffee while I’m on a call.

So far I only have two complaints with Presence headset: firstly, the slider switch to power on/off the device is a little too easy to slide accidentally when picking up the headset from the stand to put it on for a call. I need to be mindful of how I pick it up to avoid waiting for a power-off, power-on, pair cycle.

Secondly, it is not simple to remove the over-ear headband and revert back to the in-ear configuration. There are a couple of small pieces that need to be found and reconnected, and later removed again to re-attach the headband, and it would be easy to lose them. As such I think I’ll just be using the headband accessory while I’m travelling.

minikube and WSL

I develop services that run on Kubernetes. During development minikube provides an convenient way to run a local Kubernetes “cluster” regardless of whether you use Windows, OS X, or a Linux distribution as your host OS.

Day-to-day I use minikube on Windows 10 and I prefer to use the Windows Subsystem for Linux (WSL) bash shell to have a scripting environment consistent with my colleagues, some of whom do not use Windows, and consistent with the CI system.

The Linux binary of minikube isn’t very useful in WSL since it doesn’t support the Hyper-V driver and the Virtualbox driver cannot deal with the path differences it sees within WSL compared to those reported by VboxManage.exe.

However, when running the Windows minikube.exe binary, many of the commands (e.g. start, stop, ip, dashboard) just work without any special configuration. Furthermore, creating a symlink so minikube can be executed on the PATH without the .exe extension easily improves the default experience. Beyond these initial commands though, some extra effort is required.

SSH can be a little flakey with minikube ssh so I find it better to create an alias to use WSL’s ssh client:

ssh -a -i "$(wslpath -u "$(minikube ssh-key)")" -l docker "$(minikube ip)"

You may get an error from this SSH command that the permissions of the identity file are too open. This is fixed in two steps. Firstly ensure you have added metadata to the automount options in your /etc/wsl.conf and restart your WSL session.

Secondly, change WSL’s view of the permissions of the key file:

chmod 0600 "$(wslpath -u "$(minikube ssh-key)")"

The minikube docker-env command doesn’t recognise the WSL environment and outputs the PowerShell environment commands instead. This can be worked around by passing the --shell bash arguments but the DOCKER_CERT_PATH environment variable value won’t work with the docker Linux binary as-is and the Windows binary needs the WSLENV environment variable set appropriately. These extra steps are enough to justify a helper script which I’ve published as a GitHub Gist. With this script dot-sourced, both the Windows and Linux binaries for the Docker client will work with Minikube’s Docker daemon.

Lastly, use the Windows binary for kubectl. The paths for Kubernetes certificates in the .kube/config file make it difficult to use the kubectl Linux binary and so far I haven’t found a problem with the Windows version.

wslpath and mktemp

The Windows Subsystem for Linux (aka WSL or Bash on Ubuntu on Windows) provides a fantastic reproduction of a local Linux environment without needing a virtual machine.

Even better than a virtual machine, WSL includes a lot of conveniences for interoperating with the host Windows file system and processes. That is, I can access my C: drive via /mnt/c/ and I can pop calc via calc.exe.

Naturally, the nature of file paths in Linux and Windows are quite different so WSL performs some translations where it can (e.g. for the current working directory) and provides the wslpath utility for explicit conversions where necessary.

Recently I discovered that even though the root filesystem of my particular WSL installation is accessible from Windows (via %LocalAppData%/lxss/rootfs in my case), WSL will not translate just any path within Linux to a path within this rootfs directory. And this is because WSL is designed with the idea that Windows processes should not modify WSL files.

However I work with various version controlled scripts shared amongst developers on Mac, Linux, and Windows (via Cygwin mostly) that use /tmp/ as a staging area (via mktemp) and when using WSL, Windows processes don’t see this directory. If the current working directory is in /tmp/, the working directory of the Windows process will become the Windows user profile directory instead. And running wslpath -w /tmp/ just returns Result not representable.

To avoid modifying the shared scripts to be WSL-aware, I instead converted my WSL tmp directory to be mounted from the Windows host file system via the following set of commands.

First, define the directory to use as WSL’s tmp, I chose C:\wsltemp\ out of convenience, but it could be any path you prefer.

$ mkdir -p /mnt/c/wsltmp
$ chmod 1777 /mnt/c/wsltmp # tmp dir should have the Sticky-bit set
$ sudo chown root: /mnt/c/wsltmp

Also, to ensure Linux’ case-sensitivity is honoured for this directory, from an elevated PowerShell, run:

> fsutil.exe file setCaseSensitiveInfo c:\wsltmp enable

While NTFS has supported opt-in case-sensitivity for a very long time, it has only recently supported setting it per directory.

Finally, define the mount in WSL and mount it:

printf '\n/mnt/c/wsltmp\t/tmp\tnone\tbind\t0\t0\n' | sudo tee -a /etc/fstab
sudo mount -a

Now your WSL session, and future sessions (assuming you haven’t disabled mountFsTab), will have a /tmp/ directory which will be correctly translated for Windows processes.

Warning: if you use ssh-agent in WSL, mounting /tmp/ to a DrvFS volume instead of LxFS will mean the ssh-agent socket (in /tmp/ssh-*/agent.*) will not be available for WSL processes to connect, it will only be accessible by Win32 processes and therefore not useful for typical scenarios.

Inspecting Docker container processes from the host

While I favour a containerize-all-the-things approach to new projects I still need to maintain systems that were designed several years ago around a combination of containers and host-based applications working together.

In these situations it is common enough to execute ps or iotop on the host and see all the host and container processes together with no obvious indication of which processes belong to which containers.

Here I will share some simple commands to help map the host-view of a containerised process to its container.

First, given a host PID, how do I know which container it belongs to?

$ cat "/proc/${host_pid}/cgroup"
...
4:cpu:/docker/769739f359ec192edf6c565f7756bb5ecabcfac3e691c2444794ab6a7d398e39
...

The procfs cgroup file will show the full Docker container ID which you can then use with docker inspect to get more container details.

Vice-versa if you have the container ID and want to locate the host process(es) you can use:

$ sudo ps -e -o pid,comm,cgroup | grep "/docker/${cid}"

Lastly, if you’re trying to debug a container process from the host and need the host-path to the process’ binary I have found a method to has been working reliably.

Unfortunately, because the procfs exe file is a symbolic link, not a hard link it won’t resolve to the file within the container’s layered file system so a few extra steps are required.

First, read the symlink to get the fully-qualified container-path to the binary:

$ exe=$(readlink "/proc/${host_pid}/exe")

Next, parse the process’ memory-mapped files to locate the first memory region referencing this file path:

$ map=$(grep -m1 -F "${exe}" "/proc/${host_pid}/maps" | cut -d' ' -f1)

Lastly read the symlink for this memory map from procfs’ map_files directory:

$ readlink "/proc/${host_pid}/map_files/${map}"

This final output should look something like this:

/var/lib/docker/aufs/diff/3cc533dae9a6cc96d6092844be3ce78c737db793cf1493b9f47e652e96bfd71e/bin/sleep

Note that the long identifier in that path is not the container ID, nor is it available via docker inspect, although I’m sure someone else has posted online how to locate this path via other means.

Lessons from DigitalOcean Networking

Update: On 2017-DEC-13, DigitalOcean announced that private networking will be isolated to each account beginning February 2018.


If you’ve come from running virtual machines on AWS, Azure, or Google Cloud, you will be familiar with the idea that the VMs can have a public Internet-facing IP address and a private IP address, or some combination or multiple of the two options.

DigitalOcean offers something similar, but just different enough to throw you when you’re accustomed to the networking models of the other cloud providers. When you create a DigitalOcean Droplet via their Control Panel, or via their API, you have the option to enable “Private networking” but when you read the official documentation, this feature is actually called “Shared private networking” and it is a very important distinction.

Where private networking in AWS, Azure, or Google Cloud gives your VM a private interface to a network shared only with your VMs, the shared private networking in DigitalOcean is, according to this DigitalOcean tutorial, “accessible to other VPSs in the same datacenter–which includes the VPSs of other customers in the same datacenter”. And I have verified that statement is true.

To clarify, if you enable private networking on a DigitalOcean VM in their SFO2 region, every other VM in the SFO2 region from every other DigitalOcean customer can route packets to your VM’s private network interface. While I advocate the use of strict firewall configurations in any cloud hosting environment, the importance of doing so correctly is much higher on DigitalOcean, even for non-Production environments where firewalls have a history of being more relaxed.

The bright side of all this is that DigitalOcean’s tag-based Cloud Firewall applies to both the public and private network interfaces and implements a deny-by-default behaviour. By using tags to restrict which other droplets are permitted to communicate on specific ports and protocols you can achieve a very similar level of isolation as offered by other cloud providers.

There is another caveat though: to improve the security of this shared private networking environment, DigitalOcean do not allow VMs to send packets with a source IP address that does not match their assigned private IP address. This prevents you, for example, from operating one DigitalOcean VM as a Virtual Private Network gateway for your other DigitalOcean VMs to connect through to another non-DigitalOcean private network.

In summary, while DigitalOcean is providing a great service, and adding new features seemingly every quarter, it offers a conceptual model slightly out of sync with the big name cloud companies, and you need to by mindful of this, but the same would be true I guess for people experienced with DigitalOcean moving to AWS or Azure.

Finding deleted code in git

Recently, Matt Hilton blogged about Source Control Antipatterns which included the practice of commenting code instead of deleting the code.

As wholeheartedly as I agree with deleting code, I know that a popular objection is that deleted code is harder to find. While it might be harder than your favourite editor’s Find In Files feature, it is important to know how to use the tools central to your development workflow.

For my work, and seemingly the majority of projects today, git is the version control tool of choice. So I’m sharing some git commands here that I have found useful for locating deleted code. I’m using the Varnish Cache repository for my examples if you want to try them yourself.

If you know some text from the code that was deleted, you can find the commit where it was deleted. In this example I’m looking for when the C structure named smu was deleted.

$ git log -G "struct +smu" --oneline

766dee0 Drop long broken umem code

If you know the name of a file that was deleted, but aren’t sure which directory the file was in, you can find the commit when the file was deleted with:

$ git log --oneline -- **/storage_umem.c

766dee0 Drop long broken umem code
b07c34f include cleanup - found by FlexeLint
75615a6 When I grow up, I want to learn to program in C

 

If you’re not even sure what the deleted file was named but just want to see recent commits with deleted files, you can use:

$ git log --diff-filter=D --summary

commit f4faa6e3c431d6ccf581f5683af56008e4d4be10
Author: Federico G. Schwindt <fgsch@lodoss.net>
Date: Fri Mar 10 18:59:14 2017 +0000

Fold r00936.vtc into vcc_action.c tests

delete mode 100644 bin/varnishtest/tests/r00936.vtc

There is a lot more you can do with git log than just find deleted code but hopefully these examples are a useful start.

Beware Docker and sysctl defaults on GCE

On Google Compute Engine (GCE) the latest VM boot images (at the time of writing) for Ubuntu 14.04 and 16.04 (eg ubuntu-1604-xenial-v20170811) ship with a file at /etc/sysctl.d/99-gce.conf which contains:

net.ipv4.ip_forward = 0

This kernel parameter determines whether packets can be forwarded between network interfaces. On its own, the presence of this line isn’t a big deal.

Separately, when you start the Docker daemon (at least in version 17.06.0-ce), it sets this kernel parameter to 1 (assuming you haven’t specified --ip-forward=false in the Docker configuration). Docker needs packet forwarding enabled so that Docker containers using the default bridge network can communicate outside the host.

If you later execute sysctl --system or similar after has Docker has started, for example to apply a new value for the nf_conntrack_max kernel parameter that you’ve specified in another file under /etc/sysctl.d/, then the ip_forward parameter will revert to 0 care of GCE’s default conf file.

At this point you’ll find your containers cannot reach the outside world, for example this will fail to resolve:

docker run ubuntu:16.04 getent hosts google.com

This will remain broken for all existing or new containers until you set the ip_forward parameter back to 1 manually or by restarting the Docker daemon.

If you’re using any Docker version since v1.8 (released about 2 years ago) you should see the following message when running a container with bridge networking if IP forwarding is disabled:

WARNING: IPv4 forwarding is disabled. Networking will not work.

Of course, that only helps if you’re using docker run interactively and does not help if the parameter gets changed after the containers are already running.

If you’re in this situation, add your own file to /etc/sysctl.d/ that follows 99-gce.conf alphabetically (eg 99-luftballon.conf) and ensure it contains:

net.ipv4.ip_forward = 0

You may also want to ensure the file has a trailing LF character to avoid any issues with processing it.

You can check the current value of the ip_forward kernel parameter with one of these two commands:

sysctl net.ipv4.ip_forward
cat /proc/sys/net/ipv4/ip_forward

The case of the addled ARP

In recent weeks we started receiving alerts whenever a new AWS EC2 Instance running Ubuntu 14.04 LTS was launched for a specific Auto Scaling Group. On average, one new instance would be provisioned per day but the fault would only occur for about one or two of the new instances per week.

The alert was an indicator that the new instance was unable to communicate with the message broker located on another instance. However, after approximately 20 minutes the issue would self-resolve. Also, if we manually provisioned a new replacement instance, it would successfully communicate with the broker.

With the short window of failure and no consistent period between occurrences so this problem continued through several operations shifts and staff members before a plan was established to capture more details of the problem.

On the next alert we were able to investigate and establish several facts:

  1. The affected instance was unable to communicate due to a connection timeout. It was sending TCP SYN packets and receiving no reply.
  2. The message broker was receiving the TCP SYN packet from the affected instance and replying with a SYN+ACK packet but the MAC address on the reply packet did not match the MAC address on the incoming SYN packet.
  3. Running ip neigh show on the message broker instance reported that the IP address of the affected instance was associated with an unrelated MAC address and was in the STALE state but occasionally also in the REACHABLE state.
  4. The unrelated MAC address was not associated with any other instances running in the VPC nor any that had been recently terminated.

At this point we setup two monitors on the message broker instance while we waited for the problem to self-resolve. The first was a tcpdump to capture all ARP traffic and the second was a shell script to continuously poll and record the ARP table. The ARP traffic capture contained very little and nothing at all helpful but the ARP table records were very interesting.

While the affected instance was unable to connect to the message broker, the ARP table cycled through the states REACHABLE then STALE then DELAY​ and back to REACHABLE again, retaining the same incorrect MAC address association the whole time. The DELAY state never lasted as long as five seconds.

At the moment when the problem self-resolved, the DELAY state did last for five seconds and then transitioned to the PROBE state, then to the FAILED state and finally back to REACHABLE but this time with the correct MAC address.

This insight lead one of our team members to find this Red Hat bug describing a Linux kernel issue that aligned with exactly the behaviour we were experiencing. Unfortunately the fix for this bug wasn’t merged until Linux kernel 4.11 which was only released in May and reportedly won’t be officially available in Ubuntu until Artful Aardvark 17.10.

Our assessment of all the stale ARP entries on the message broker combined with the known scaling behaviours of the messaging clients suggested that some entries had been there for at least 8 weeks. So this wasn’t a by-product of replacing instances rapidly and recycling IP addresses in the subnet too quickly.

As an interim solution we have implemented a cron job to remove any stale entries from the message broker’s ARP table and this has prevented the alerts from re-appearing for several weeks now.

 

Always upgrade ixgbevf on AWS EC2

I recently had a frustrating experience with network connectivity for a set of AWS EC2 Instances running Ubuntu Trusty 14.04.

Three instances, running Graphite and Carbon Cache 0.9.15 would intermittently become unreachable on the network for seconds or minutes at a time and several times a day. There was no obvious pattern to when these events would occur and when they did there was no interesting change in their CPU utilisation, memory usage, or disk IO aside from the inevitable reduction in activity associated with a lack of data or queries coming from the network.

AWS reported the Graphite instances were failing their Instance Status Check. The external instances attempting to communicate with these Graphite machines just experienced TCP timeouts. When the Graphite instances themselves became network-reachable again, their system logs showed that processes had continued running as normal during the outage.

The first hint of a reason for this behaviour came from the Graphite instances’ syslog reporting No route to host during the outage while a cron job was attempting to connect to another instance on the same subnet in the same Availability Zone. This suggested something was wrong with either ARP, or the network interface, but there were no logs or kernel messages suggesting the network interface had gone down and EC2 resolves ARP at the Hypervisor.

I configured collectd to harvest all the network-related metrics possible on the Graphite instances themselves and I configured VPC Flow Logs to record details of all the network traffic in the subnet. After the next period of failed connectivity I discovered that Flow Logs showed that all packets were reaching the EC2 Network Interfaces of the Graphite instances but the instance’s collectd data showed no packets received, but no network errors either.

These Graphite instances were now running the AWS M4 Instance Type but they were not originally provisioned as such which lead me to investigate the Enhanced Networking features available to these instance types. I eventually found this suspicious paragraph in the AWS documentation:

In the above Ubuntu instance, the module is installed, but the version is 2.11.3-k, which does not have all of the latest bug fixes that the recommended version 2.14.2 does. In this case, the ixgbevf module would work, but a newer version can still be installed and loaded on the instance for the best experience.

Enabling Enhanced Networking with the Intel 82599 VF Interface on Linux Instances in a VPC

Our instances were running the 2.11.3-k version of the ixgbevf driver mentioned in the documentation, which is the older “would work” version but also the most recent version included with Ubuntu Trusty. Some further research into this network driver on AWS revealed some other discussions of similarly flakey network connectivity so I decided to upgrade the driver on one of the Graphite instances.

As per the same AWS documentation, the recommended version 2.14.2 does not build properly on some versions of Ubuntu, so I installed version 2.16.4, which required an OS restart. I monitored the upgraded instance for 24 hours and it remained healthy with no connectivity interruption for the whole period whilst the other two instances continued to fail intermittently so I upgraded the network driver on a second instance. After 72 hours of stable behaviour on the two upgraded instances, I upgraded the third and the problem is now completely resolved for those instances.

Expecting that these issues could easily recur on our other systems I wanted to ensure they were all using the newest driver, however due to the OS restart for the new network driver to load, adding the driver install steps to the provisioning script was undesirable. Experimentation revealed that rmmod and modprobe seem to allow the upgraded network driver to become active without an OS restart but I decided that baking a new AMI with the driver pre-installed was preferred.

I have also discovered that the version of the ixgbevf driver included with Ubuntu Xenial 16.04 is more recent that Trusty’s but still older than the version recommended by AWS so a custom AMI is still required.

I’ve shared my experience and findings with AWS Support and asked them to modify their documentation to more strongly recommend installing the newer driver.