Merge branch 'cluster_fu' - pyratelog

commit fe863c83f71d7b4371a2a54fe23bc0b45f6bb316
parent 6cf454925c7cd7a44b9e497c5bb358922033dcab
Author: pyratebeard <root@pyratebeard.net>
Date:   Fri, 10 May 2024 17:41:45 +0100

Merge branch 'cluster_fu'

Diffstat:
A entry/20240510-cluster_fu.md  | 763 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 file changed, 763 insertions(+), 0 deletions(-)
diff --git a/entry/20240510-cluster_fu.md b/entry/20240510-cluster_fu.md
@@ -0,0 +1,763 @@
+I am always looking for ways to improve my home setup and one of the things I had been wanting to do for a while was to put together a dedicated homelab.
+
+My desktop computer is a bit of a workhorse so it has always been used for creating virtual machines (VMs), running [OCI][1] containers such as [Docker][2] or [Podman][3], [LXC][4] containers, and testing various warez.
+
+Recently I acquired four ThinkCentre M900 Tiny devices.  After dedicating one to become another 'production' system running [Proxmox][5], I decided to use two as my homelab.  Now, I could have put Proxmox on these two as well then [created a cluster][6], but where is the fun in that?  I felt like I didn't know enough about how clustering is done on Linux so set out to build my own from scratch.  The end goal was to have the ability to create LXC containers and VMs, which I can then use for testing and running OCI containers along with tools such as [Kubernetes][7].
+
+## base camp
+I get on well with [Debian][8] as a server operating system so installed the latest version (12 "_Bookworm_" at time of writing) on my two nodes, which I am naming _pigley_ and _goatley_ (IYKYK).
+
+For the cluster I opted to use [Pacemaker][9] as the resource manager and [Corosync][10] for the cluster communication, with [Glusterfs][11] for shared storage.
+
+Now before everyone writes in saying how two node clusters are not advised, due to quorum issues or split-brain scenarios, I have thought about it.
+
+I didn't want to use the forth ThinkCentre in this homelab, I have other plans for that.  Instead I opted to use a spare RaspberryPi (named _krieger_) as a Corosync quorum device.  This (should) counteract the issues seen in a two node cluster.
+
+Ideally for Glusterfs I would configure _krieger_ as an arbiter device, however in order to get the same version of `glusterfs-server` (10.3 at time of writing) on Raspbian I had to add the testing repo.  Unfortunately I couldn't get the `glusterd` service to start.  The stable repo only offered `glusterfs-server` version 9.2-1 at the time, which was incompatible with 10.3-5 on _pigley_ and _goatley_.
+
+I decided to forgo the Glusterfs arbiter, while there is a risk of split-brain this is only a lab environment.
+
+After provisioning _pigley_ and _goatley_ I installed the required packages
+```
+apt-get install pcs corosync-qdevice glusterfs-server
+```
+
+* `pcs` - pacemaker/cluster configuration system.  This package will install
+	* `pacemaker`
+	* `corosync`
+* `corosync-qdevice` - for the quorum device
+* `glusterfs-storage`
+
+According to the documentation it is advisable to disable Pacemaker from automatic startup for now
+```
+systemctl disable pacemaker
+```
+
+On _krieger_ I installed the `corosync-qnetd` package
+```
+apt-get install pcs corosync-qnetd
+```
+
+## share and share alike
+On _pigley_ and _goatley_ I created a partition on the main storage device and formatted it with XFS, created a mount point, and mounted the partition
+```
+mkfs.xfs -i size=512 /dev/sda3
+mkdir -p /data/glusterfs/lab
+mount /dev/sda3 /data/glusterfs/lab
+```
+
+Next I had to ensure _pigley_ and _goatley_ could talk to each other.  To make things easy I put the IP addresses in _/etc/hosts_, then using the `gluster` tool confirmed connectivity
+```
+systemctl start glusterd
+systemctl enable glusterd
+gluster peer probe pigley
+gluster peer status
+```
+
+I opted to configure a replica volume, keeping the data on 2 bricks (as that's all I have)
+```
+gluster volume create lab0 replica 2 pigley.home.lab:/data/glusterfs/lab/brick0 \
+    goatley.home.lab:/data/glusterfs/lab/brick0
+gluster volume start lab0
+gluster volume info
+```
+
+The data isn't accessed directly in the brick directories, so I mounted the Glusterfs volume on a new mountpoint on both systems
+```
+mkdir /labfs && \
+	mount -t glusterfs <hostname>:/lab0 /labfs
+```
+
+To test the replication was working I created a few empty files on one of the systems
+```
+touch /labfs/{a,b,c}
+```
+
+Then checked they existed on the other system
+```
+ls -l /labfs
+```
+
+And they did! Win win.
+
+I did experience an issue when adding the _/labfs_ mount in _/etc/fstab_, as it would try to mount before the `glusterd` service was running.  To workaround this I included the `noauto` and `x-systemd.automount` options to my _/etc/fstab_ entry
+```
+localhost:/lab0 /labfs glusterfs defaults,_netdev,noauto,x-systemd.automount 0 0
+```
+
+## start your engine
+Now the `corosync` config.  On both nodes I created _/etc/corosync/corosync.conf_
+```
+cluster_name: lab
+crypto_cipher: none >> crypto_cipher: aes256
+crypto_hash: none >> crypto_hash: sha1
+nodelist {
+	node {
+		name: pigley
+		nodeid: 1
+		ring0_addr: 192.168.1.8
+	}
+	node {
+		name: goatley
+		nodeid: 2
+		ring0_addr: 192.168.1.9
+	}
+}
+```
+
+On one node I had to generate an authkey using `corosync-keygen`, then copied it (_/etc/corosync/authkey_) to the other node.  I could then add the authkey to my cluster and restart the cluster services on each node
+```
+pcs cluster authkey corosync /etc/corosync/authkey --force
+systemctl restart corosync && systemctl restart pacemaker
+```
+
+The cluster takes a short while to become clean so I monitored it using `pcs status`.  The output below shows everything (except [STONITH][12]) is looking good
+```
+Cluster name: lab
+
+WARNINGS:
+No stonith devices and stonith-enabled is not false
+
+Status of pacemakerd: 'Pacemaker is running' (last updated 2023-10-24 21:40:57 +01:00)
+Cluster Summary:
+  * Stack: corosync
+  * Current DC: pigley (version 2.1.5-a3f44794f94) - partition with quorum
+  * Last updated: Tue Mar 26 11:37:06 2024
+  * Last change:  Tue Mar 26 11:36:23 2024 by hacluster via crmd on pigley
+  * 2 nodes configured
+  * 0 resource instances configured
+
+Node List:
+  * Online: [ goatley pigley ]
+
+Full List of Resources:
+  * No resources
+
+Daemon Status:
+  corosync: active/enabled
+  pacemaker: active/enabled
+  pcsd: active/enabled
+```
+
+STONITH, or "Shoot The Other Node In The Head", is used for fencing failed cluster nodes.  As this is a test lab I am disabling it but may spend some time configuring it in the future
+```
+pcs property set stonith-enabled=false
+```
+
+## the votes are in
+As mentioned, I want to use a third system as a quorum device.  This means that it casts deciding votes to protect against split-brain yet isn't part of the cluster, so doesn't have to be capable of running any resources.
+
+While I used an authkey to authenticate _pigley_ and _goatley_ in the cluster, for _krieger_ I had to use password authentication.  On _pigley_ I set the `hacluster` user's password
+```
+passwd hacluster
+```
+
+On _krieger_ I set the same password and started the quorum device
+```
+passwd hacluster
+pcs qdevice setup model net --enable --start
+```
+
+Back on _pigley_ I then authenticated _krieger_ and specified it as the quorum device
+```
+pcs host auth krieger
+pcs quorum device add model net host=krieger algorithm=ffsplit
+```
+
+The output of `pcs quorum device status` shows the QDevice information
+```
+Qdevice information
+-------------------
+Model:                  Net
+Node ID:                1
+Configured node list:
+    0   Node ID = 1
+    1   Node ID = 2
+Membership node list:   1, 2
+
+Qdevice-net information
+----------------------
+Cluster name:           lab
+QNetd host:             krieger:5403
+Algorithm:              Fifty-Fifty split
+Tie-breaker:            Node with lowest node ID
+State:                  Connected
+```
+
+On _krieger_ the output of `pcs qdevice status net` shows similar information
+```
+QNetd address:                  *:5403
+TLS:                            Supported (client certificate required)
+Connected clients:              2
+Connected clusters:             1
+Cluster "lab":
+    Algorithm:          Fifty-Fifty split (KAP Tie-breaker)
+    Tie-breaker:        Node with lowest node ID
+    Node ID 2:
+        Client address:         ::ffff:192.168.1.9:52060
+        Configured node list:   1, 2
+        Membership node list:   1, 2
+        Vote:                   ACK (ACK)
+    Node ID 1:
+        Client address:         ::ffff:192.168.1.8:43106
+        Configured node list:   1, 2
+        Membership node list:   1, 2
+        Vote:                   No change (ACK)
+```
+
+## build something
+Now my cluster is up and running I can start creating resources.  The first thing I wanted to get running were some VMs.
+
+I installed `qemu` on _pigley_ and _goatley_
+```
+apt-get install qemu-system-x86 libvirt-daemon-system virtinst
+```
+
+Before creating a VM I made sure the default network was started, and set it to auto start
+```
+virsh net-list --all
+virsh net-start default
+virsh net-autostart default
+```
+
+I uploaded a Debian ISO to _pigley_ then used `virt-install` to create a VM
+```
+virt-install --name testvm \
+	--memory 2048 \
+	--vcpus=2 \
+	--cdrom=/labfs/debian-12.1.0-amd64-netinst.iso \
+	--disk path=/labfs/testvm.qcow2,size=20,format=qcow2 \
+	--os-variant debian11 \
+	--network network=default \
+	--graphics=spice \
+	--console pty,target_type=serial -v
+```
+
+The command waits until the system installation is completed, so from my workstation I used `virt-viewer` to connect to the VM and run through the Debian installer
+```
+virt-viewer --connect qemu+ssh://pigley/system --wait testvm
+```
+
+Once the installation is complete and the VM has been rebooted I can add it as a resource to the cluster.  First the VM (or VirtualDomain in `libvirt` speak) has to be shutdown and the configuration XML saved to a file
+```
+virsh shutdown testvm
+virsh dumpxml testvm > /labfs/testvm.xml
+pcs resource create testvm VirtualDomain \
+	config=/labfs/testvm.xml \
+	migration_transport=ssh \
+	meta \
+	allow-migrate=true
+```
+
+To allow the resource to run on any of the cluster nodes the `symmetric-cluster` option has to be set to `true` (I am not bothering with specific resource rules at this time).  Then I can enable the resource
+```
+pcs property set symmetric-cluster=true
+pcs resource enable testvm
+```
+
+Watching `pcs resource` I can see that the VM has started on _goatley_
+```
+  * testvm    (ocf:heartbeat:VirtualDomain):   Started goatley
+```
+
+On _goatley_ I can check that the VM is running with `virsh list`
+```
+ Id   Name       State
+--------------------------
+ 1    testvm     running
+```
+
+To connect from my workstation I can use `virt-viewer` again
+```
+virt-viewer --connect qemu+ssh://goatley/system --wait testvm
+```
+
+Now I can really test the cluster by moving the VM from _goatley_ to _pigley_ with one command from either node
+```
+pcs resource move testvm pigley
+```
+
+The VM is automatically shutdown and restarted on _pigley_, now the output of `pcs resource` shows
+```
+  * testvm    (ocf:heartbeat:VirtualDomain):   Started pigley
+```
+
+Successfully clustered!
+
+Most of the VMs I create will probably be accessed remotely via `ssh`.  The VM network on the cluster is not directly accessible from my workstation, I have to `ProxyJump` through whichever node is running the VM (this is by design)
+```
+ssh -J pigley testvm
+```
+
+Unless I check the resource status I won't always know which node the VM is on, so I came up with a workaround.
+
+The `libvirt` network sets up `dnsmasq` for local name resolution, so by setting the first `nameserver` on _pigley_ and _goatley_ to `192.168.122.1` (my virtual network) each node could resolve the hostnames of the VirtualDomains that are running on them.  I set this in _dhclient.conf_
+```
+prepend domain-name-servers 192.168.122.1;
+```
+
+On my workstation I made use of the tagging capability and `Match` function in SSH to find which node a VM is on
+```
+Match tagged lab exec "ssh pigley 'ping -c 1 -W 1 %h 2>/dev/null'"
+  ProxyJump pigley
+
+Match tagged lab
+  ProxyJump goatley
+```
+
+When I want to connect to a VM I specify a tag with the `-P` flag
+```
+ssh -P lab root@testvm
+```
+
+My SSH config will then use the first `Match` to ping that hostname on _pigley_ and if the VM is running on _pigley_ it will succeed, making _pigley_ the proxy.  If the `ping` fails SSH will go to the second `Match` and use _goatley_ as the proxy.
+
+## this cloud is just my computer
+Now that my cluster is up and running I want to be able to interact with it remotely.
+
+For personal ethical reasons I opted to use [OpenTofu][13] instead of [Terraform][14].  OpenTofu's command `tofu` is a drop-in replacement, simply swap `terraform` for `tofu`. OpenTofu doesn't have a module for managing Pacemaker cluster resources but it does have a `libvirt` module, so I started there.
+
+I created an OpenTofu configuration using the provider [dmacvicar/libvirt][15].  Unfortunately I had issues trying to get it to connect over SSH.
+
+My OpenTofu configuration for testing looked like this
+```
+terraform {
+	required_providers {
+		libvirt = {
+			source = "dmacvicar/libvirt"
+			version = "0.7.1"
+		}
+	}
+}
+
+provider "libvirt" {
+	uri = "qemu+ssh://pigley/system"
+}
+
+resource "libvirt_domain" "testvm" {
+	name = "testvm"
+}
+```
+
+After initialising (`tofu init`) I ran `tofu plan` and got an error
+```
+Error: failed to dial libvirt: ssh: handshake failed: ssh: no authorities for hostname: pigley:22
+```
+
+This was remedied by setting the `known_hosts_verify` option to `ignore`
+```
+...
+
+provider "libvirt" {
+	uri = "qemu+ssh://pigley/system?known_hosts_verify=ignore"
+}
+
+...
+```
+
+Another `tofu plan` produced another error
+```
+Error: failed to dial libvirt: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
+```
+
+At first it I thought it was due to using an SSH agent for my key, so I created a dedicated passphrase-less SSH keypair and specified the file in the `uri`
+```
+...
+
+provider "libvirt" {
+	uri = "qemu+ssh://pigley/system?keyfile=/home/pyratebeard/.ssh/homelab_tofu.key&known_hosts_verify=ignore"
+}
+
+...
+```
+
+This produced the same error again.  After some Github issue digging I found mention of setting the [sshauth][16] option to `privkey`.  The default is supposedly `agent,privkey` but as I found it isn't picking up my agent, even with `$SSH_AUTH_SOCK` set.  I set the option in the `uri`
+```
+...
+
+provider "libvirt" {
+	uri = "qemu+ssh://pigley/system?keyfile=/home/pyratebeard/.ssh/homelab_tofu.key&known_hosts_verify=ignore&sshauth=privkey"
+}
+
+...
+```
+
+Finally it worked!  Until I tried to apply the plan with `tofu apply`
+```
+Error: error while starting the creation of CloudInit's ISO image: exec: "mkisofs": executable file not found in $PATH
+```
+
+This was easily fixed by installing the `cdrtools` package on my workstation
+```
+pacman -S cdrtools
+```
+
+After creating a new VM I want to automatically add it as a cluster resource.  To do this I chose to use [Ansible][17] and so that I don't have to run two lots of commands I wanted to use Ansible to deploy my OpenTofu configuration.  Ansible does have a [terraform module][18] but there is not yet one for OpenTofu.  A workaround to this is to create a symlink for the `terraform` command
+```
+sudo ln -s /usr/bin/tofu /usr/bin/terraform
+```
+
+Ansible will never know the difference!  The task in the playbook looks like this
+```
+- name: "tofu test"
+  community.general.terraform:
+    project_path: '~src/infra_code/libvirt/debian12/'
+    state: present
+    force_init: true
+  delegate_to: localhost
+```
+
+That ran successfully so I started to expand my OpenTofu config so it would actually build a VM.
+
+
+In order to not have to go through the ISO install every time, I decided to use the Debian cloud images then make use of [Cloud-init][19] to apply any changes when the new VM is provisioned.  Trying to keep it similar to a "real" cloud seemed like a good idea.
+
+```
+terraform {
+	required_providers {
+		libvirt = {
+			source = "dmacvicar/libvirt"
+			version = "0.7.1"
+		}
+	}
+}
+
+provider "libvirt" {
+	uri = "qemu+ssh://pigley/system?keyfile=/home/pyratebeard/.ssh/homelab_tofu.key&known_hosts_verify=ignore&sshauth=privkey"
+}
+
+variable "vm_name" {
+	type = string
+	description = "hostname"
+	default = "testvm"
+}
+
+variable "vm_vcpus" {
+	type = string
+	description = "number of vcpus"
+	default = 2
+}
+
+variable "vm_mem" {
+	type = string
+	description = "amount of memory"
+	default = "2048"
+}
+
+variable "vm_size" {
+	type = string
+	description = "capacity of disk"
+	default = "8589934592" # 8G
+}
+
+resource "libvirt_volume" "debian12-qcow2" {
+	name = "${var.vm_name}.qcow2"
+	pool = "labfs"
+	source = "http://cloud.debian.org/images/cloud/bookworm/latest/debian-12-genericcloud-amd64.qcow2"
+	format = "qcow2"
+}
+
+resource "libvirt_volume" "debian12-qcow2" {
+	name = "${var.vm_name}.qcow2"
+	pool = "labfs"
+	format = "qcow2"
+	size = var.vm_size
+	base_volume_id = libvirt_volume.base-debian12-qcow2.id
+}
+
+data "template_file" "user_data" {
+	template = "${file("${path.module}/cloud_init.cfg")}"
+	vars = {
+		hostname = var.vm_name
+	}
+}
+
+resource "libvirt_cloudinit_disk" "commoninit" {
+	name = "commoninit.iso"
+	pool = "labfs"
+	user_data = "${data.template_file.user_data.rendered}"
+}
+
+resource "libvirt_domain" "debian12" {
+	name = var.vm_name
+	memory = var.vm_mem
+	vcpu = var.vm_vcpus
+
+	network_interface {
+		network_name = "default"
+		wait_for_lease = true
+	}
+
+	disk {
+		volume_id = "${libvirt_volume.debian12-qcow2.id}"
+	}
+
+	cloudinit = "${libvirt_cloudinit_disk.commoninit.id}"
+
+	console {
+		type = "pty"
+		target_type = "serial"
+		target_port = "0"
+	}
+}
+```
+
+The _cloud\_init.cfg_ configuration is very simple at the moment, only setting the hostname for DNS to work and creating a new user
+```
+#cloud-config
+ssh_pwauth: false
+
+preserve_hostname: false
+hostname: ${hostname}
+
+users:
+  - name: pyratebeard
+    ssh_authorized_keys:
+      - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICSluiY54h5FlGxnnXqifWPnfvKNIh1/f0xf0yCThdqV
+    sudo: ['ALL=(ALL) NOPASSWD:ALL']
+    shell: /bin/bash
+    groups: wheel
+```
+
+Out of (good?) habit I tested this with `tofu` before running it with Ansible, and I hit another issue (getting tiring isn't it?!)
+```
+Error: error creating libvirt domain: internal error: process exited while connecting to monitor: ... Could not open '/labfs/debian-12-genericcloud-amd64.qcow2': Permission denied
+```
+
+OpenTofu wasn't able to write out the _qcom2_ file due to AppArmor.  I attempted to give permission in _/etc/apparmor.d/usr.lib.libvirt.virt-aa-helper_ yet that didn't seem to work.  Instead I set the following line in _/etc/libvirt/qemu.conf_ and restarted `libvirtd` on _pigley_ and _goatley_
+```
+security_device = "none"
+```
+
+That "fixed" my issue and I successfully built a new VM.  Now I could try it with Ansible.
+
+My playbook applies the infrastructure configuration, then creates a cluster resource using the same `virsh` and `pcs` commands as I used earlier
+
+```
+- hosts: pigley
+  gather_facts: true
+  become: true
+  pre_tasks:
+    - name: "load vars"
+      ansible.builtin.include_vars:
+        file: vars.yml
+      tags: always
+
+  tasks:
+    - name: "create vm"
+      community.general.terraform:
+        project_path: '{{ tofu_project }}'
+        state: present
+        complex_vars: true
+        variables:
+          vm_name: "{{ vm_name }}"
+          vm_vcpus: "{{ vm_vcpus }}"
+          vm_mem: "{{ vm_mem }}"
+          vm_size: "{{ vm_size }}"
+        force_init: true
+      delegate_to: localhost
+
+    - name: "shutdown vm & dumpxml"
+      ansible.builtin.shell: |
+        virsh shutdown {{ vm_name }} && \
+          virsh dumpxml {{ vm_name }} > /labfs/{{ vm_name }}.xml
+
+    - name: "create cluster resource"
+      ansible.builtin.shell: |
+        pcs resource create {{ vm_name }} VirtualDomain \
+        config=/labfs/{{ vm_name }}.xml \
+        migration_transport=ssh \
+        meta \
+        allow-migrate=true
+```
+
+This is not the most elegant solution of adding the cluster resource, yet seems to be the only way of doing it.
+
+The _vars.yml_ file which is loaded at the beginning lets me define some options for the VM
+```
+vm_os: "debian12" # shortname as used in opentofu dir hierarchy
+vm_name: "testvm"
+vm_vcpus: "2"
+vm_mem: "2048"
+vm_size: "8589934592" # 8G
+
+## location of opentofu project on local system
+tofu_project: "~src/infra_code/libvirt/{{ vm_os }}/"
+```
+
+When the playbook runs the VM variables defined in _vars.yml_ override anything configured in my OpenTofu project.  This means that once my infrastructure configuration is crafted I only have to edit the _vars.yml_ file and run the playbook.  I can add more options to _vars.yml_ as I expand the configuration.
+
+When the playbook completes I can SSH to my new VM without knowing where in the cluster it is running, or even knowing the IP thanks to the SSH tag and the `libvirt` DNS
+```
+ssh -P lab pyratebeard@testvm
+```
+
+## contain your excitement
+For creating LXC (not LXD) containers the only (active and working) OpenTofu/Terraform modules I could find were for Proxmox or [Incus][20].  I have not been able to look into the new Incus project at the time of writing so for now I went with using Ansible's [LXC Container][21] module.
+
+On _pigley_ I installed LXC and the required Python package for use with Ansible
+```
+apt-get install lxc python3-lxc
+```
+
+I opted not to configure unprivileged containers at this time, this is a lab after all.
+
+After a quick test I decided to not use the default LXC bridge, instead I configured it to use the existing "default" network configured with `libvirt`.  This enabled me to use the same SSH tag method for logging in as the nameserver would resolve the LXC containers as well.  The alternative was to configure my own DNS as I can't use two separate nameservers for resolution.
+
+In _/etc/default/lxc-net_ I switched the bridge option to `false`
+```
+USE_LXC_BRIDGE="false"
+```
+
+In _/etc/lxc/default.conf_ I set the network link to the `libvirt` virtual device
+```
+lxc.net.0.link = virbr0
+```
+
+Then I restarted the LXC network service
+```
+systemctl restart lxc-net
+```
+
+To use my Glusterfs mount for the LXC containers I had to add the `lxc.lxcpath` configuration to _/etc/lxc/lxc.conf_
+```
+lxc.lxcpath = /labfs/
+```
+
+I tested this by manually creating an LXC container
+```
+lxc-create -n testlxc -t debian -- -r bookworm
+```
+
+Which resulted in a ACL error
+```
+Copying rootfs to /labfs/testlxc/rootfs...rsync: [generator] set_acl: sys_acl_set_file(var/log/journal, ACL_TYPE_ACCESS): Operation not supported (95)
+```
+
+The fix for this was to mount _/labfs_ with the `acl` option in _/etc/fstab_
+```
+localhost:/lab0 /labfs glusterfs defaults,_netdev,noauto,x-systemd.automount,acl 0 0
+```
+
+With Ansible, creating a new container is straight forward
+```
+- name: Create a started container
+  community.general.lxc_container:
+    name: testlxc
+    container_log: true
+    template: debian
+    state: started
+    template_options: --release bookworm
+```
+
+Once it is created I could connect with the same SSH tag I used with the VMs
+```
+ssh -P lab root@testlxc
+```
+
+This wouldn't let me in with the default build configuration, I am expected to set up a new user as I did with the VM cloud image.  There is no way (that I know of) to use Cloud-init to do this with Ansible.  Thankfully the Ansible LXC module has a `container_command` option, which allows specified commands to run inside the container on build.
+
+I adjusted my playbook task to create a new user and included a task to load variables from a file
+```
+- hosts: pigley
+  gather_facts: true
+  become: true
+  pre_tasks:
+    - name: "load vars"
+      ansible.builtin.include_vars:
+        file: vars.yml
+      tags: always
+
+  tasks:
+    - name: Create a started container
+      community.general.lxc_container:
+        name: "lxc-{{ lxc_name }}"
+        container_log: true
+        template: "{{ lxc_template }}"
+        state: started
+        template_options: "--release {{ lxc_release }}"
+        container_command: |
+          useradd -m -d /home/{{ username }} -s /bin/bash -G sudo {{ username }}
+          [ -d /home/{{ username }}/.ssh ] || mkdir /home/{{ username }}/.ssh
+          echo {{ ssh_pub_key }} > /home/{{ username }}/.ssh/authorized_keys
+```
+
+With the variables stored in _vars.yml_
+```
+lxc_template: "debian"
+lxc_release: "bookworm"
+lxc_name: "testlxc"
+username: "pyratebeard"
+ssh_pub_key: "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICSluiY54h5FlGxnnXqifWPnfvKNIh1/f0xf0yCThdqV"
+```
+
+Now I can log in with SSH
+```
+ssh -P lab pyratebeard@testlxc
+```
+
+With a similar command as the VM resource creation I tested adding the LXC container to the cluster
+```
+pcs resource create testlxc ocf:heartbeat:lxc \
+	container=testlxc
+	config=/labfs/testlxc/config \
+	op monitor timeout="20s" interval="60s" OCF_CHECK_LEVEL="0"
+```
+
+This created the resource, which I could migrate between hosts as before
+```
+pcs resource move testlxc goatley
+```
+
+I updated my playbook to include the resource creation task
+```
+- name: "create cluster resource"
+  ansible.builtin.shell: |
+    pcs resource create {{ lxc_name }} ocf:heartbeat:lxc \
+    container={{ lxc_name }}
+    config=/labfs/{{ lxc_name }}/config \
+    op monitor timeout="20s" interval="60s" OCF_CHECK_LEVEL="0"
+```
+
+And done!  My homelab is now ready to use.  I am able to quickly create Virtual Machines and LXC containers as well as accessing them via SSH without caring which cluster node they are on.
+
+After testing all of the VM and container creations I made a small change to my SSH config to discard host key fingerprint checking.  I set `StrictHostKeyChecking` to `no`, which stops the host key fingerprint accept prompt, then set the `UserKnownHostsFile` to _/dev/null_ so that fingerprints don't get added to my usual known hosts file.
+```
+Match tagged lab exec "ssh pigley 'ping -c 1 -W 1 %h 2>/dev/null'"
+  ProxyJump pigley
+  StrictHostKeyChecking no
+  UserKnownHostsFile /dev/null
+
+Match tagged lab
+  ProxyJump goatley
+  StrictHostKeyChecking no
+  UserKnownHostsFile /dev/null
+```
+
+It is fun having my own little cloud, building it has certainly taught me a lot and hopefully will continue to do so as I improve and expand my OpenTofu and Ansible code.
+
+If you are interested keep an eye on my [playbooks][22] and [infra_code][23] repositories.
+
+[1]: https://opencontainers.org/
+[2]: https://www.docker.com/resources/what-container/
+[3]: https://docs.podman.io/en/latest/
+[4]: https://linuxcontainers.org/lxc/introduction/
+[5]: https://www.proxmox.com/en/
+[6]: https://pve.proxmox.com/wiki/Cluster_Manager
+[7]: https://kubernetes.io/
+[8]: https://debian.org
+[9]: http://clusterlabs.org/pacemaker
+[10]: http://corosync.github.io/
+[11]: https://www.gluster.org/
+[12]: https://en.wikipedia.org/wiki/STONITH
+[13]: https://opentofu.org
+[14]: https://terraform.io/
+[15]: https://github.com/dmacvicar/terraform-provider-libvirt
+[16]: https://github.com/dmacvicar/terraform-provider-libvirt/issues/886#issuecomment-986423116
+[17]: https://www.ansible.com/
+[18]: https://docs.ansible.com/ansible/latest/collections/community/general/terraform_module.html
+[19]: https://cloud-init.io/
+[20]: https://linuxcontainers.org/incus/
+[21]: https://docs.ansible.com/ansible/latest/collections/community/general/lxc_container_module.html
+[22]: https://git.pyratebeard.net/playbooks/
+[23]: https://git.pyratebeard.net/infra_code/

	pyratelog personal blog
	git clone git://git.pyratebeard.net/pyratelog.git
	Log \| Files \| Refs \| README