Hosting Public Unifi Controller on Private Kubernetes Cluster

I manage a few Unifi deployments on the side, and part of this is hosting a Unifi controller. For the past year, I have been hosting this controller on a DigitalOcean Kubernetes cluster – which has worked really well. DOKS is a good service (great if you’re a cheapskate like me and don’t want to pay for your controlplane). I have no complaints about their offerings whatsoever.

Given that I recently build a nice shiny new homelab, I wanted to put it to use. I also could reduce my cloud bill by hosting things locally instead. However, one hurdle remained to be overcome – granting public access to the Unifi install from behind my NAT.

I knew about frp from previous research on this topic, but I recently came across https://github.com/b4fun/frpcontroller. User bcho had done the legwork to build a k8s controller to run frp and create tunnels from within Kubernetes. Their code was a bit old though, and needed some updating. I forked their project and re-implemented it onto kubebuilder v3, along with upgrading to Go v1.17. You can check it out at https://github.com/ebauman/frpcontroller.

Here’s the steps I followed to put together these pieces and host Unifi behind NAT.

FRP Server

  1. Spin up a t2.micro instance on EC2
  2. Acquire an elastic IP and associate it with the instance
  3. On this micro instance, download the latest frp release and tar xzvf
  4. Move frps into /usr/local/bin
  5. Create /etc/frp/frps.ini with the following content:
    [common]
    bind_port = 7000
    bind_addr = [PRIVATE IPV4 ADDR]
    token = [YOUR TOKEN]
    
  6. Create a new unit file for the frps service. This file should be /etc/systemd/system/frps.service and should have the following content:
    [Unit]
    Description=fast reverse proxy
    
    [Service]
    ExecStart=/usr/local/bin/frps -c /etc/frp/frps.ini
    
    [Install]
    WantedBy=multi-user.target
    
  7. Execute systemctl daemon-reload, followed by systemctl enable frps --now

FRP Client on Private K8s Cluster

Setting up the private k8s cluster is outside of the scope of this guide.

With the cluster setup, install frpcontroller by calling kubectl apply -f https://raw.githubusercontent.com/ebauman/frpcontroller/main/release/v0.0.2/install.yaml (check for a more recent version in github.com/ebauman/frpcontroller/tree/main/release and use that instead)

Installing frpcontroller also installs two CRDs into your cluster – Endpoint and Service. An Endpoint is the client-side reference for an FRP server. In this case, we’ll make one to point to our newly-created t2.micro instance.

First, create the namespace into which you will eventually install the Unifi controller. kubectl create namespace unifi

Next, create a new file called endpoint.yaml and place into it the following contents:

apiVersion: frp.1eb100.net/v1
kind: Endpoint
metadata:
    name: unifi
    namespace: unifi
spec:
    addr: '1.2.3.4' # your elastic ip. include the quotes
    port: 7000
    token: yourtoken

Create this endpoint by calling kubectl apply -f endpoint.yaml.

Next, you’ll need to install the Unifi controller. There is plenty of documentation on how to accomplish this – I use https://artifacthub.io/packages/helm/k8s-at-home/unifi to do this.

Once Unifi is installed, there are various ports you will need to connect using FRP. Most commonly, these are:

PortUsage
tcp/8080device and app communication
udp/10001device discovery
tcp/8443controller web ui
tcp/6789unifi mobile speed test
udp/3478STUN
udp/5514remote syslog capture
Full list available at https://help.ui.com/hc/en-us/articles/218506997-UniFi-Required-Ports-Reference

Create service.yaml with the following content:

apiVersion: frp.1eb100.net/v1
kind: Service
metadata:
    name: unifi
    namespace: unifi
spec:
    endpoint: unifi
    ports:
    - name: tcp-8080
      localPort: 8080
      remotePort: 8080
      protocol: TCP
    - name: udp-10001
      localPort: 10001
      remotePort: 10001
      protocol: UDP
    - name: tcp-8443
      localPort: 8443
      remotePort: 8443
      protocol: TCP
    - name: tcp-6789
      localPort: 6789
      remotePort: 6789
      protocol: TCP
    - name: udp-3478
      localPort: 3478
      remotePort: 3478
      protocol: UDP
    - name: udp-5514
      localPort: 5514
      remotePort: 5514
      protocol: UDP
    selector:
        [dependent upon your setup, see notes below]

The spec.selector field here works just like it does on a regular k8s service. The values here are dependent upon how you installed Unifi. For instance, my selector(s) are:

selector:
    app.kubernetes.io/name: unifi

Yours may differ. Typically the Unifi chart deploys services for you so you will see selectors implemented on those services – you can copy those.

Create the service in k8s by calling kubectl apply -f service.yaml.

If all goes successfully, you should see a pod created in the unifi namespace that runs the frpc software. Looking at the logs of this pod should show the connection being established to your frps server.

That’s it! Now you can browse to https://your-elastic-ip:8443/ and get to the Unifi page.

New Homelab in 2022

I decided finally to put together a decent homelab for 2022. My goals were to have plenty of compute power and fast storage to test all sorts of things.

To that end, I began in December of 2021 searching out some hardware to achieve this goal. I have long been a Supermicro fanboy and so limited my search on eBay to that hardware. After some research(1) into various 1U options, I settled on SYS-6018U-TR4T. I was able to get three of these for roughly $350/ea shipped. Shout out to unixsurplus for having dope deals.

The processors in this system aren’t too old, it has plenty of room for expansion, and lots of DDR4 slots. I was in love (still am). I purchased three of these servers plus enough ram to get up to 128GB per unit. I also bought up (12) 1TB SK Hynix Gold SSDs, enough to fill each server with 4TB raw.

I initially had planned to use Harvester for this, but it turned out not to be the right solution for me. Not that there is anything wrong with Harvester, mind you, it just isn’t geared (right now) towards general purpose virtualization. It’s great if you need a target for deploying RKE2/K3s nodes from Rancher, but not so great if you need Windows virt, etc.

I decided to move to vSphere, having previously been a VMware admin and certified in it. It’s still the gold standard for datacenter virt and I’m still very much a fanboy. However, over time I became disenchanted with VSAN which is what I would need to use if I wanted to treat these three nodes as HCI. Therefore, I decided to go into the DIY SAN world.

I acquired another SYS-6018U-TR4T as well as an old NetApp disk array enclosure, the DS2246. While traditionally used in a NetApp device, this DAE with its IOM6 controllers will serve up the SAS (or in my case, SATA) drives to a generic HBA all day long. That generic HBA was an LSI 2008 I purchased, an old Fujitsu card.

The IOM6 controllers use a QSFP for the SAS cabling, and the LSI takes in regular SAS 8087. I grabbed three of these along with some QSFP->SFF-8088 cables. This chain of converters lets my HBA happily talk to the IOM6 controllers in the enclosure.

With all of this in hand, I initially tried to build a ZFS box. The LSI performed well, so did the server, but I botched the zpool or zdev setup. Not sure which, but the performance was abysmal. I was nerd sniped into trying this in the first place (damn you Crothers), and after much frustration I reimaged with TrueNAS.

Now, everything works flawlessly! I stuffed the DAE full of the original sk hynix ssds, set up a simple raidz2, and configured iSCSI.

Footnotes

(1) searching “supermicro 1u” and sorting by price, lowest

Cheap k3s cluster using Amazon Lightsail

I am a cheapskate, at least when it comes to cloud services.

I will happily shell out for a nice home lab, but there is something about a monthly payment that brings out my frugality. Thus I try to pare down as much usage of cloud resources as I can.

I’ve got a handful of stuff that I host on some EC2 instances. Largest among them is probably my Ubiquiti UniFi controller, which services not only my WiFi installation but also that of some “clients” (read: friends).

My day job is working with Kubernetes. At Rancher Labs, I spend all day talking to clients about Kubernetes – so it only made sense for me to want to host these projects on K8s. However, being the cheapskate that I am, running K8s in the cloud is not what *I* would consider cheap. EKS is like $72/mo just for the control plane – not including any worker nodes. I love Rancher software but running a full K8s stack would require at least t2.mediums, which would run me about $30/mo each. ($0.0464 * 24 * 30).

Sure I could do spot instances, or long-term contracts, or whatever. But I found a solution I liked a little more: Amazon Lightsail.

If you’re not familiar with Amazon Lightsail, here is a snippet from a description on the AWS website:

Lightsail is an easy-to-use cloud platform that offers you everything needed to build an application or website, plus a cost-effective, monthly plan.

https://aws.amazon.com/lightsail/

What this really means? Cheap virtual machines. A 1GB/1CPU instance with 40GB SSD and 2TB transfer will run you five US dollars per month. A comparable t2-series instance (t2.micro) will cost approximately $8 USD/mo.

1GB/1CPU is not a lot of horsepower, so obviously a full k8s cluster does not make much sense. However, did I mention I work for Rancher Labs? We have this awesome little distribution of Kubernetes called k3s.

If you’re not familiar with k3s, here’s a snippet from the site:

K3s is a highly available, certified Kubernetes distribution designed for production workloads in unattended, resource-constrained, remote locations or inside IoT appliances.

https://k3s.io/

See that little “resource-constrained” portion? Great! Let’s set up some cheap lightsail instances, and run k3s on them.

Prerequisites

You’re going to need an AWS account. I think this can be a lightsail-only account, but if you have a full AWS account, you can use that too.

You’ll also want to get a copy of Alex Ellis’ excellent k3sup tool. This is what we will use to install k3s onto the nodes.

Also have a copy of kubectl handy. Latest install of k3s leverages Kubernetes 1.17, so if you have that or greater, perfect.

Instructions

Details such as OS and instance size may be modified to your taste. These are what I used, but feel free to experiment!

  1. Log onto the Lightsail console, and create a new instance. Select Linux/Unix platform, and then use Ubuntu 18.04 LTS. For the instance size, select the $5 USD option.
    undefined
    undefined
  2. Create four nodes using this pattern.
    • One will be your master node. Call that one “master”
    • Three will be agents. Call them “agent” and scale the count to 3:
      undefined
    • Be sure you save your SSH keypair to a well-known location! This is important as we will use that SSH key to connect to the nodes and provision k3s.
  3. Once all the nodes have been created, let’s give them static IPs. This is important in case you need to stop/start your nodes in the future – we don’t want their IPs to change!
    1. For each node, click on the name of the node and go to “Networking” tab.
    2. On the networking tab, click “Create static IP” undefined
    3. Select your instance, and assign the new static IP to that instance.
    4. Repeat this process for each node in your cluster (master, agent-1, agent-2, agent-3).
  4. In order to communicate with our master node, we’ll need to adjust the firewall rules for the node.
    1. Once again, click on the master node and go to the “Networking” tab.
    2. Click on “Add Rule”
    3. Specify “Custom” application, “TCP” protocol, and “6443” as the port.
    4. Important: Consider restricting this to an IP! By default this will be open to the world and anyone will be able to connect to your Kubernetes API server on 6443. I limit the IP address to my home IP. This can be discovered by going to ipchicken.com.
      undefined
    5. Click “Create” to save this rule.
  5. In order for our agent nodes to communicate with the master (and with each other), we will need to add firewall rules between the nodes. Grab a piece of paper (or text editor) and jot down the IPs of your nodes. For example:
    master: 1.1.1.1
    agent-1: 2.2.2.2
    agent-2: 3.3.3.3
    agent-3: 4.4.4.4

    Now, go node-by-node and setup firewall rules according to the following steps:
    1. Click on the node, and go to the “Networking” tab
    2. Click on “Add Rule”
    3. Specify “All Protocols” application
    4. Check the Restrict to IP address box, and enter the IP addresses of every node except the node you are editing. For example, if I am configuring the rules for agent-2, it may look like this:
      undefined
    5. Perform these steps for all nodes (master, and all agents).
  6. Now that the nodes are setup, let’s head to your command line. We need to install the k3s master first. To do so, execute the following command:
    k3sup install --ip <master_node_ip> --user ubuntu --ssh-key <path_to_ssh_key> --local-path ~/.kube/lightsail
    This will install the master k3s node, and output a kubeconfig file at ~/.kube/lightsail. If that is not a valid location on your system, you may need to tweak this command.
  7. Once you have a valid kubeconfig file, let’s test if the master is working. Issue the following commands:
    export KUBECONFIG=~/.kube/lightsail
    kubectl get nodes

    You should see an output similar to:
    NAME STATUS ROLES AGE VERSION
    ip-172-26-1-104 Ready master 2m v1.17.2+k3s1
    Yay our first k3s node is up!
  8. Let’s join the remaining agent nodes. To do so, issue the following command for one of your agent nodes:
    k3sup join --server-ip <master_node_ip> --ip <agent_ip> --user ubuntu --ssh-key <path_to_ssh_key>

    This should completely quickly and a new node should join your cluster! To verify, execute
    kubectl get nodes once again, and check output:
    NAME STATUS ROLES AGE VERSION
    ip-172-26-1-104 Ready master 5m v1.17.2+k3s1
    ip-172-26-2-76 Ready <none> 1m v1.17.2+k3s1
  9. Issue the command in Step 8 for the remaining nodes. Hooray! You have built a k3s cluster on Lightsail.

Rebooting This

As you can see, this blog used to have entries.

I obviously stopped writing for some time.

During that time, I switched jobs and made some changes in my life (I got married!).

I am hoping to start writing here again about interesting things that I am doing.

I hope you will enjoy them.

Veeam: File does not exist or locked (vmx file)

Recently had a weird one – Veeam kept reporting that it could not download the .vmx files for particular virtual machines. These VMs all had one thing in common – they had ran (or were running on) a particular host. But that host looked fine to me – like any other hosts in the cluster.

Turns out, that host was missing a domain in the list of search domains for the TCP/IP stack. I had a.example.com, but I also needed other.a.example.com!

Added that in, and things started working just fine.

500 Error in C# ASP.NET Application

I recently encountered a rather frustrating issue relating to an ASP.NET 4.5 application that we host using IIS.

Requests for static files were returning with a 500 error with no other information. Attempting to load the file by itself yielded “The page cannot be displayed because an internal server error has occurred.”

I attempted to change settings regarding error detail, to no avail. I couldn’t get anything to return to the client except “The page cannot be displayed because an internal server error has occurred.”

I turned on IIS failed request tracing, and configured the providers. I was finally able to determine that an extra <mimeMap> declaration in our Web.config file was gumming things up. Specifically:

Cannot add duplicate collection entry of type ‘mimeMap’ with unique key attribute ‘fileExtension’ set to ‘.svg’

Because of this extraneous entry, I also was unable to open the MIME Map UI option in the IIS features panel.

Once I removed it, things went back to normal!

Unable to change personality of HP 556FLR-SFP+ (or, Emulex OneConnect OCe14000)

During a recent server install, I ran into an issue where I could not change the personality of an HP 556FLR-SFP+ FlexLOM (HP p/n 727060-B21). This is a 10Gbe converged adapter, capable of NIC, iSCSI and FCoE personalities.

We were unable to select any personality other than the default iSCSI personality, when attempting to change it through the UEFI configuration menu. Tried many things, but what ultimately fixed it was running the latest HP care pack against the machine (burned a USB), and upgrading the firmware. The latest HP care pack had an update for the firmware of this Emulex adapter, and that resolved the issue.

Maybe this’ll help someone out there.

Can’t bring up virtual vCenter server after un-registering and re-registering VM.

So, due to some unrelated disk locking issues (see https://baumaeam.wordpress.com/2015/09/22/unable-to-start-vms-failed-to-lock-the-file/), my vCenter VM failed to start today.

From the aforementioned blog post, the solution for any other VM besides vCenter would have been to powercycle the responsible host for the file lock (or all of them, for good measure), and then restart the affected VM. However, this is not possible if the vCenter VM is the victim, as you can’t really do vMotion without vCenter!

Regardless, I went down a long rabbit hole full of attempted fixes that ultimately required me to restore the vCenter VM from a Veeam backup directly to a host. It worked great, except I couldn’t vMotion the vCenter VM anymore! vSphere kept throwing the following error whenever I attempted to vMotion the VM:

vim.fault.NotFound

“That’s odd”, I thought to myself. Maybe because the VM was registered directly on the ESXi host and then brought up, vCenter somehow see itself there? So when I tried to vMotion, it wouldn’t figure it out? Not sure. I figured a possible fix would be to shut down the vCenter server, open the c# client to the host that it was on, and unregister and reregister. Perhaps doing that would get the process right, and things would work.

… not so much.

I connected the client to the target host, and unregistered the VM, then reregistered it on a different host. After reregistering it, the network adapter for the vCenter server could no longer connect to the distributed switch. So the VM would come up, but vCenter couldn’t start because it didn’t have a network adapter to talk to the Platform Services Controller.

My solution was to create a new portgroup (with the appropriate VLAN) on an existing vSphere Standard Switch, steal a host NIC away from the LAG in the vDS, add it to the VSS, and then power up the vCenter VM. Once it came up, it was able to connect to the PSC, and get the vCenter Server process up. Then I moved it back to the vDS, and things seemed to work okay again!

Hope this helps anyone facing the same issue, where their vCenter server is unable to get going because of vDS inaccessibility.

Unable to start VMs – Failed to lock the file.

Recently, I encountered an issue in my vSphere environment where VMs were randomly dying, and HA was unable to turn them back on. When trying to manually start these failed VMs, I received the following error message:

An error was received from the ESX host while powering on VM vCenter Support Assistant Appliance.
Failed to start the virtual machine.
Module Disk power on failed.
Cannot open the disk ‘/vmfs/volumes/4c0ed2a0-cbb490fe-2645-0018fe2e950a/vCenter Support Assistant Appliance/vCenter Support Assistant Appliance_1-000002.vmdk’ or one of the snapshot disks it depends on.
Failed to lock the file

(The issue was happening with my vCenter Support Assistant Appliance in this example).

Some investigation of the issue revealed that it was occurring after Veeam had backed up the machine in the routine overnight backup job. I pursued a support ticket with Veeam, to have them refer me to VMware as the issue was occurring after a normal call to a vSphere API.

Doing more digging that day, I uncovered the following messages in the vmware.log file for the VM in question:

2015-08-26T02:49:36.674Z| vcpu-0| W110: Mirror_DisconnectMirrorNode: Failed to send disconnect ioctl for mirror node ‘28763a-24763d-svmmirror’: (Device or resource busy)
2015-08-26T02:49:36.674Z| vcpu-0| W110: Mirror: scsi0:1: MirrorDisconnectDiskMirrorNode: Failed to disconnect mirror node ‘/vmfs/devices/svm/28763a-24763d-svmmirror’
2015-08-26T02:49:36.674Z| vcpu-0| W110: ConsolidateDiskCloseCB: Failed to destroy mirror node while consolidating disks ‘/vmfs/volumes/4c0ed2a0-cbb490fe-2645-0018fe2e950a/vCenter Support Assistant Appliance/vCenter Support Assistant Appliance_1-000001.vmdk’ -> ‘/vmfs/volumes/4c0ed2a0-cbb490fe-2645-0018fe2e950a/vCenter Support Assistant Appliance/vCenter Support Assistant Appliance_1.vmdk’.
2015-08-26T02:49:36.674Z| vcpu-0| I120: NOT_IMPLEMENTED bora/vmx/checkpoint/consolidateESX.c:382
2015-08-26T02:49:40.270Z| vcpu-0| W110: A core file is available in “/vmfs/volumes/4c0ed2a0-cbb490fe-2645-0018fe2e950a/vCenter Support Assistant Appliance/vmx-zdump.000”
2015-08-26T02:49:40.270Z| vcpu-0| W110: Writing monitor corefile “/vmfs/volumes/4c0ed2a0-cbb490fe-2645-0018fe2e950a/vCenter Support Assistant Appliance/vmmcores.gz”
2015-08-26T02:49:40.345Z| vcpu-0| W110: Dumping core for vcpu-0

Odd, I thought. something with the mirror driver causing problems?

A bit of quick googling yielded this KB article: Investigating virtual machine file locks on ESXi/ESX (10051)

Using the info in that KB article, I went onto an ESX host and used vmkfstools to try and discover the host that was causing the lock on the VMDK(s) in question. On each file (not just the one in question, but all VMDKs for the machine), no host was being reported as holding a lock. Yet the inability to power on the machine persisted. I rebooted all of the hosts in the cluster, and the VM came back up. At this point, I invoked VMware’s technical support.

The support representative went through all the steps that I had done prior to calling, and uncovered the same information. However, they also discovered SCSI device reservation conflicts during the same time as the file locking issues. Their diagnosis?

Incompatible SCSI HBAs.

Sure enough, after going on the VMware website and checking the HCL, my HBAs (specifically the driver version) were not supported for ESXi 6.0. I installed the updated driver on the affected hosts, and haven’t seen the problem since!

Hopefully this helps someone else facing the same issue – make sure you check the version of the drivers for your HBAs, as it could cause issues.