Setup a Cluster

Setup LXD on your Machines

Manual cluster setup is no longer necessary

The new version of our terraform module (0.3.x onwards) automates clustering out of the box. However if you wish to learn what it’s doing under the hood, read on.

This next section will show you how to setup an LXD cluster. It depends on you having already setup a cluster of machines with one of the cloud providers. If you haven’t done that please first setup your compute resource.

There may be some minor differences between each provider which will be highlighted.

Once you boot up your instance you can login by using the following command (assuming you’ve setup your ssh authentication correctly).

If you would like to follow along setting up your LXD cluster, you can watch this 5 min video below:

Login via SSH

If you used the terraform module to setup your cluster, you can login to your bastion. Simply copy the ip address of your bastion and login with

# DigitalOcean and Hetzner
ssh root@<ip-address>

Now you’re in the bastion. You’ll want to jump to the actual nodes that will be running your application.

You’ll want to use the private ip of each node from your bastion

# assuming you use 10.x.x.x/16 as your subnet
ssh [email protected]

Install LXD

You may not need to install LXD

On some providers LXD comes pre-installed in the default image. If you type lxd init and you get an error, it probably means you need to install LXD first.

Run the following commands:

apt update && apt install snapd

This will setup snapd, we’ll then setup lxd using snap

snap install lxd

Once LXD is installed you should be able to run lxd init which will lead you to the next step.

LXD Init

Let’s initialize LXD.

lxd init

LXD will guide you through the initialization steps

Would you like to use LXD clustering? (yes/no) [default=no]: yes

For the IP address or DNS name you can use the private IP of your instance.

What IP address or DNS name should be used to reach this node? [default=146.190.80.210]: 10.0.1.2

Since this is your first node the answer to the following is n

Are you joining an existing cluster? (yes/no) [default=no]: no

For your node name you can keep the default one

What name should be used to identify this node in the cluster? [default=docs-example-01]: <enter to use default>

For the next one LXD will ask you to setup password authentication. Go ahead and type no and hit enter.

Setup password authentication on the cluster? (yes/no) [default=no]:

Next LXD will prompt you to setup a storage pool. Simply hit enter to move forward.

Do you want to configure a new local storage pool? (yes/no) [default=yes]: <enter>

You can opt for zfs as the default. Simply hit enter to move forward.

Name of the storage backend to use (btrfs, dir, lvm, zfs) [default=zfs]:

When asked to create a new ZFS pool use the default yes. Hit enter to go to the next step.

Create a new ZFS pool? (yes/no) [default=yes]:

When asked if you would like to use an empty block device you can use the default answer no. Simply hit enter to move to the next.

Would you like to use an existing empty block device (e.g. a disk or partition)? (yes/no) [default=no]: 

You will be asked about the size of the loop device. Whatever your disk size is you can subtract 9 and use that. For example my disk had 24GB that gives me about 15.

Size in GiB of the new loop device (1GiB minimum) [default=5GiB]: 15GiB

Next up you’ll be asked if you want to configure a new remote storage pool. For this one you can choose no. Hit enter to pass right through.

Do you want to configure a new remote storage pool? (yes/no) [default=no]:

When asked if you wish to connect MAAS server you can answer no. Hit enter and pass right through.

Would you like to connect to a MAAS server? (yes/no) [default=no]:

Next you’ll be asked about setting up bridge or host interface for networking. For this one you can answer as the default no.

Would you like to configure LXD to use an existing bridge or host interface? (yes/no) [default=no]:

Next up is Fan overlay network. On this one you’ll want to answer yes. Simply hit enter to move to the next setting.

Would you like to create a new Fan overlay network? (yes/no) [default=yes]:

Next is subnet Fan underlay configuration. On this one you’ll want to enter what you used when you created the VPC on the cloud provider.

Subnet Configuration

The subnet will depend on what you used to create your VPC with. You can check in the UI of the cloud provider. Here are some examples:

  • 10.0.0.0/16
  • 10.0.1.0/24

LXD only accepts subnet configuration with either /16 or /24

What subnet should be used as the Fan underlay? [default=auto]: 10.0.2.0/24

Next one is if you would like LXD to automatically update cached images. You can go ahead and answer the default which is yes.

Would you like stale cached images to be updated automatically? (yes/no) [default=yes]:

Next is the last LXD will ask if you wish to output the YAML config you can enter no as the default for now. If we need it we can generate the preseed later.

Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

This concludes the initialization process. It’s good practice to reboot your machine once after initialization.

sudo reboot

Test Drive LXD

Once you have initialized lxd you can try launching a container. You can try the following commands.

$ lxc list

+------+-------+------+------+------+-----------+----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION |
+------+-------+------+------+------+-----------+----------+

Let’s try launching a container

$ lxc launch images:alpine/3.17 test
Creating test
Starting test

If you run lxc list again you should see

+------+---------+--------------------+------+-----------+-----------+----------------+
| NAME |  STATE  |        IPV4        | IPV6 |   TYPE    | SNAPSHOTS |    LOCATION    |
+------+---------+--------------------+------+-----------+-----------+----------------+
| test | RUNNING | 240.2.0.216 (eth0) |      | CONTAINER | 0         | nebula-node-02 |
+------+---------+--------------------+------+-----------+-----------+----------------+

The container has started and there is an ip address assigned. At this point your LXD initialization can be marked as a success.

At this point you have a choice. You can continue using a single instance setup or if you want a more scalable setup in the next section we will show you how to add more machines to your cluster.

Adding a Node

Before we can add a new node we will need to run a command on one of the previous nodes that is already setup. So ssh into one of your existing nodes and run the following.

lxc cluster add <name-of-new-node>
Member <name-of-new-node> join token: <output-of-token>

You will need this join token for the new node.

To Add a node to your cluster, you can simply modify your terraform module by increasing the cluster_size parameter. The terraform module will provision a new node in your cluster then you can simply run lxd init. This time we will answer certain questions differently. Let’s begin!

Would you like to use LXD clustering? (yes/no) [default=no]: yes
What IP address or DNS name should be used to reach this node? [default=some-ip-address]: <private-ip-here>
Are you joining an existing cluster? (yes/no) [default=no]: yes
Do you have a join token? (yes/no/[token]) [default=no]: <output-of-token>
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Choose "size" property for storage pool "local": 30GiB
Choose "source" property for storage pool "local": 
Choose "zfs.pool_name" property for storage pool "local": 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

That’s it! Your node is now added to the cluster. If you run lxc cluster list you will see a list of nodes.

+----------------+-----------------------+------------------+--------------+----------------+-------------+--------+-------------------+
|      NAME      |          URL          |      ROLES       | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE  |      MESSAGE      |
+----------------+-----------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| nebula-node-01 | https://10.0.1.1:8443 | database-standby | x86_64       | default        |             | ONLINE | Fully operational |
+----------------+-----------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| nebula-node-02 | https://10.0.1.2:8443 | database-leader  | x86_64       | default        |             | ONLINE | Fully operational |
|                |                       | database         |              |                |             |        |                   |
+----------------+-----------------------+------------------+--------------+----------------+-------------+--------+-------------------+

If you run lxc list you should see a list of containers running (assuming you’re already launched a test container).

+------+---------+--------------------+------+-----------+-----------+----------------+
| NAME |  STATE  |        IPV4        | IPV6 |   TYPE    | SNAPSHOTS |    LOCATION    |
+------+---------+--------------------+------+-----------+-----------+----------------+
| test | RUNNING | 240.2.0.216 (eth0) |      | CONTAINER | 0         | nebula-node-02 |
+------+---------+--------------------+------+-----------+-----------+----------------+

To add more nodes you can simply repeat the above steps.

Verifying Networking

Once you have joined 2 nodes in your cluster successfully It’s time to test the cross node networking. This is simple to do.

Let’s launch another container. This time we will use the --target flag to make sure the container is launched on a different node from test

$ lxc launch images:alpine/3.17 test2 --target nebular-node-01
Creating test2
Starting test2

Running lxc list should return 2 containers.

+-------+---------+--------------------+------+-----------+-----------+----------------+
| NAME  |  STATE  |        IPV4        | IPV6 |   TYPE    | SNAPSHOTS |    LOCATION    |
+-------+---------+--------------------+------+-----------+-----------+----------------+
| test  | RUNNING | 240.2.0.216 (eth0) |      | CONTAINER | 0         | nebula-node-02 |
+-------+---------+--------------------+------+-----------+-----------+----------------+
| test2 | RUNNING | 240.3.0.209 (eth0) |      | CONTAINER | 0         | nebula-node-01 |
+-------+---------+--------------------+------+-----------+-----------+----------------+

Let’s go inside the container.

$ lxc exec test2 -- ash

You should now be inside the container. The prompt should look different. Something like this

~ #

Let’s try pinging test by running ping test from inside the container. You should see the output below. It should successfully resolve the IP of test and return the time= parameter. That time is the latency between the nodes.

~ # ping test
PING test (240.2.0.216): 56 data bytes
64 bytes from 240.2.0.216: seq=0 ttl=64 time=1.588 ms
64 bytes from 240.2.0.216: seq=1 ttl=64 time=1.102 ms
64 bytes from 240.2.0.216: seq=2 ttl=64 time=1.130 ms
64 bytes from 240.2.0.216: seq=3 ttl=64 time=1.078 ms
64 bytes from 240.2.0.216: seq=4 ttl=64 time=1.000 ms
64 bytes from 240.2.0.216: seq=5 ttl=64 time=1.028 ms
^C
--- test ping statistics ---

At this point your setup is a success. You have multiple nodes running in your cluster and they can communicate with one another within the VPC.


If you have further suggestions on how we can make this better please don't hesitate to open an issue. Or reach out to us via one of our support channels. If you need a quick support and would like to chat you can join our slack group.