Hypervisor 1RU 2024
This is the build about the hypervisor in 2024.
This is a 1 RU server with fast local SSD, aprox 18TB of storage.
Bios settings
A patched bios is needed for this to boot of UEFI and also enable PCI hotplug menu in the bios. I was able to edit this and have posted the latest version here.
Disk Layout
Front Looking at the front | ||||
---|---|---|---|---|
Slot001 | Slot003 | Slot005 | Slot007 | Slot009 |
Slot000 | Slot002 | Slot004 | Slot006 | Slot008 |
Rear looking at rear | ||||
40g nic | Upper-2 | Upper-3 | ||
Lower-0 | Lower-1 |
boot disk
Boot disk = 128g mirror ZFS 1 34 2047 1007.0 KiB EF02 2 2048 2099199 1024.0 MiB EF00 3 2099200 838860800 128.0 GiB BF01
ZFS storage
/data
5 vdev's mirrored of the SAS disks 2 mirror NVME 384g partition, but dedicated on the whole disk. This can grow. optional log and l2arc on the boot NVME's
partition the NVME
root@pve:~# gdisk /dev/nvme0n1 GPT fdisk (gdisk) version 1.0.9 Partition table scan: MBR: not present BSD: not present APM: not present GPT: not present Creating new GPT entries in memory. Command (? for help): n Partition number (1-128, default 1): 1 First sector (6-390624994, default = 256) or {+-}size{KMGTP}: 2048 Last sector (2048-390624994, default = 390624767) or {+-}size{KMGTP}: +384G Current type is 8300 (Linux filesystem) Hex code or GUID (L to show codes, Enter = 8300): BF01 Changed type of partition to 'Solaris /usr & Mac ZFS' Command (? for help): p Disk /dev/nvme0n1: 390625000 sectors, 1.5 TiB Model: MZ1LB1T9HBLS-000FB Sector size (logical/physical): 4096/4096 bytes Disk identifier (GUID): 5A039EE7-96B6-4D53-BEA4-56BCF48F4ABA Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 5 First usable sector is 6, last usable sector is 390624994 Partitions will be aligned on 256-sector boundaries Total free space is 289961693 sectors (1.1 TiB) Number Start (sector) End (sector) Size Code Name 1 2048 100665343 384.0 GiB BF01 Solaris /usr & Mac ZFS Command (? for help): w Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING PARTITIONS!! Do you want to proceed? (Y/N): y OK; writing new GUID partition table (GPT) to /dev/nvme0n1. The operation has completed successfully.
create the pool
Note that due to slot 8 and 9 being on their own 2 ports of 8 they will run faster than the rest and fill unequally if paired together.
zpool create -f -o ashift=12 -O compression=lz4 -O atime=off -O xattr=sa localDataStore \ mirror /dev/disk/by-enclosure-slot/front-slot000 /dev/disk/by-enclosure-slot/front-slot001 \ mirror /dev/disk/by-enclosure-slot/front-slot002 /dev/disk/by-enclosure-slot/front-slot003 \ mirror /dev/disk/by-enclosure-slot/front-slot004 /dev/disk/by-enclosure-slot/front-slot005 \ mirror /dev/disk/by-enclosure-slot/front-slot006 /dev/disk/by-enclosure-slot/front-slot008 \ mirror /dev/disk/by-enclosure-slot/front-slot007 /dev/disk/by-enclosure-slot/front-slot009 \ special mirror /dev/disk/by-enclosure-slot/nvme-upper-2-part1 /dev/disk/by-enclosure-slot/nvme-lower-0-part1
Backplane
The onboard backplane is a BPN-SAS3-116A-N2 which has 8 SAS disks and then 2 NVME or SAS. Howerver this last thing is not true if you want to run 10 SAS disks. The right most NVME/SAS ports are called "SAS2" ports on the backplane, but are really SATA ports, and connected to the onboard SATA. As this is a backplane only, not an expander each physical SAS port from the controller is connected to one SAS drive. Since the included controller only had 8 ports, a 16 port controller is used.
NVME name spaces
https://narasimhan-v.github.io/2020/06/12/Managing-NVMe-Namespaces.html
The NVME come setup as 1.88T disks:
tnvmcap : 1,880,375,648,256 unvmcap : 375,648,256
I suspect this is a 2.0 TiB (2,199,023,255,552b) provisioned down in the controller or about 85%. moving this to a 1.6TB disk will under provision this and make it perform better in the event we use it as log or write intensive.
Ensure we're on 4096 bytes
nvme id-ns -H /dev/nvme0n1 | grep "LBA Format" nvme id-ns -H /dev/nvme1n1 | grep "LBA Format" nvme id-ns -H /dev/nvme2n1 | grep "LBA Format" nvme id-ns -H /dev/nvme3n1 | grep "LBA Format"
1 & 2 Detatch the name space
nvme detach-ns /dev/nvme1 -namespace-id=1 -controllers=4 nvme detach-ns /dev/nvme3 -namespace-id=1 -controllers=4 nvme detach-ns /dev/nvme2 -namespace-id=1 -controllers=4 nvme detach-ns /dev/nvme0 -namespace-id=1 -controllers=4
Delete the namespace
nvme delete-ns /dev/nvme1 -namespace-id=1 nvme delete-ns /dev/nvme3 -namespace-id=1 nvme delete-ns /dev/nvme2 -namespace-id=1 nvme delete-ns /dev/nvme0 -namespace-id=1
Make the new namespace
nvme create-ns /dev/nvme1 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0 nvme create-ns /dev/nvme3 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0 nvme create-ns /dev/nvme2 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0 nvme create-ns /dev/nvme0 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0
Attach the namespace to the controller
nvme attach-ns /dev/nvme1 --namespace-id=1 --controllers=4 nvme attach-ns /dev/nvme3 --namespace-id=1 --controllers=4 nvme attach-ns /dev/nvme2 --namespace-id=1 --controllers=4 nvme attach-ns /dev/nvme0 --namespace-id=1 --controllers=4
reset the controller to make it visable to the OS
nvme reset /dev/nvme1 nvme reset /dev/nvme3 nvme reset /dev/nvme2 nvme reset /dev/nvme0
Confirm it
nvme list Node Generic SN Model Namespace Usage Format FW Rev --------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme4n1 /dev/ng4n1 S64ANS0T515282K Samsung SSD 980 1TB 1 114.21 GB / 1.00 TB 512 B + 0 B 2B4QFXO7 /dev/nvme3n1 /dev/ng3n1 S5XANA0R537286 MZ1LB1T9HBLS-000FB 1 0.00 B / 1.60 TB 4 KiB + 0 B EDW73F2Q /dev/nvme2n1 /dev/ng2n1 S5XANA0R694994 MZ1LB1T9HBLS-000FB 1 157.62 GB / 1.88 TB 4 KiB + 0 B EDW73F2Q /dev/nvme1n1 /dev/ng1n1 S5XANA0R682634 MZ1LB1T9HBLS-000FB 1 157.47 GB / 1.88 TB 4 KiB + 0 B EDW73F2Q /dev/nvme0n1 /dev/ng0n1 S5XANA0R682645 MZ1LB1T9HBLS-000FB 1 0.00 B / 1.60 TB 4 KiB + 0 B EDW73F2Q
udev rules
Look here for the info on setting this up
Software
apt-get install sg3-utils-udev sdparm ledmon lsscsi net-tools nvme-cli lldpd rsyslog ipmitool vim unzip git fio sudo locate screen snmpd libsnmp-dev mstflint
configs
Screen
echo -e "#Bryan Config for scroll back buffer\ntermcapinfo xterm|xterms|xs|rxvt ti@:te@" >>/etc/screenrc
Bash Completion
configure bash completion for interactive shells
vim /etc/bash.bashrc uncomment the stuff below # enable bash completion in interactive shells #add in zfs completition . /usr/share/bash-completion/completions/zfs
root profile
# You may uncomment the following lines if you want `ls' to be colorized: export LS_OPTIONS='--color=auto' eval "$(dircolors)" alias ls='ls $LS_OPTIONS'
hostname
echo Fink >/etc/hostname
- Edit the /etc/hosts
127.0.0.1 localhost.localdomain localhost 192.168.8.186 Fink.keekles.org Fink
- Reboot
/etc/hosts
This needs to be the same across the cluster
scp /etc/hosts root@fink:/etc/hosts
SSHD
Configure SSH to only allow root via PSK and users via PSK.
snmp
- Install SNMP
sudo apt-get -y install snmp snmpd libsnmp-dev
- Stop the snmpd service so we can add a user
sudo service snmpd stop
- Add a SNMPv3 user:
sudo net-snmp-config --create-snmpv3-user -ro -A keeklesSNMPpasswd -a SHA -x AES -X keeklesSNMPpasswd KeeklesSNMP
- copy the /etc/snmp/snmpd.conf from carbonrod
scp /etc/snmp/snmpd.conf root@fink:/etc/snmp/snmpd.conf
- edit the /etc/snmp/snmpd.conf
vim /etc/snmp/snmpd.conf Update the syslocation and such
- restart snmpd
service snmpd start
- Test SNMP
# TEST snmpwalk -v3 -u KeeklesSNMP -l authPriv -a SHA -A keeklespasswd -x aes -X keeklespasswd 192.168.8.186
observium client
- Do this from eyes
export SERVER=<hostname>
- on target server:
sudo apt-get install snmpd php perl curl xinetd snmp libsnmp-dev libwww-perl #only needed for postfix apt-get -y install rrdtool mailgraph dpkg-reconfigure mailgraph
- install the current distro program on the target:
sudo curl -o /usr/local/bin/distro https://gitlab.com/observium/distroscript/raw/master/distro sudo chmod +x /usr/local/bin/distro
- From eyes
scp /opt/observium/scripts/observium_agent_xinetd $SERVER:/etc/xinetd.d/observium_agent_xinetd scp /opt/observium/scripts/observium_agent $SERVER:/usr/bin/observium_agent ssh $SERVER mkdir -p /usr/lib/observium_agent ssh $SERVER mkdir -p /usr/lib/observium_agent/scripts-available ssh $SERVER mkdir -p /usr/lib/observium_agent/scripts-enabled scp /opt/observium/scripts/agent-local/* $SERVER:/usr/lib/observium_agent/scripts-available
- on target enable the various allowed items
ln -s /usr/lib/observium_agent/scripts-available/os /usr/lib/observium_agent/scripts-enabled #ln -s /usr/lib/observium_agent/scripts-available/zimbra /usr/lib/observium_agent/scripts-enabled ln -s /usr/lib/observium_agent/scripts-available/dpkg /usr/lib/observium_agent/scripts-enabled ln -s /usr/lib/observium_agent/scripts-available/ntpd /usr/lib/observium_agent/scripts-enabled ln -s /usr/lib/observium_agent/scripts-available/virt-what /usr/lib/observium_agent/scripts-enabled ln -s /usr/lib/observium_agent/scripts-available/proxmox-qemu /usr/lib/observium_agent/scripts-enabled ln -s /usr/lib/observium_agent/scripts-available/postfix_mailgraph /usr/lib/observium_agent/scripts-enabled
- Edit /etc/xinetd.d/observium_agent_xinetd so the Observium server is allowed to connect. You can do this by substituting 127.0.0.1, or place your IP after it, separated by a space. Make sure to restart xinetd afterwards so the configuration file is read.
sudo service xinetd restart
- Test from eyes
telnet $SERVER 36602 snmpwalk -v3 -u KeeklesSNMP -l authNoPriv -a MD5 -A keeklespasswd $SERVER
default editor
update-alternatives --config editor Then select #3 vim.basic
timezone
sudo timedatectl set-timezone UTC
sudo config
add local user accounts
useradd -D -s /bin/bash -m -U -G sudo bryan
- copy over ssh
rsync -avz /home/bryan/.ssh root@192.168.8.186:/home/bryan/
Configure sudo
echo "bryan ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers.d/00-admins
Configure Postfix
Postfix is installed to forward mail for root to a smtp host.
apt-get install postfix mailutils
This will run an installer with a curses interface and you must select Satallite System. Check the System mail name is the hostname of the server, and the SMTP relay host is morty.keekles.org. Root and postmaster mail should be rootmail@allstarlink.org.
Should you need to reconfigure this use:
dpkg-reconfigure postfix
other aliases are setup in /etc/aliases. You must run newaliases after this is updated for them to take effect.
Network
NIC
The MLX4 2x40g nic needs some setting in it's module to set to Ethernet and not infiniband mode.
echo "options mlx4_core port_type_array=2,2" >/etc/modprobe.d/mlx4.conf
You'll need the PCI ID for the next ones to set the port type to 2 Ethernet.
mstconfig -d 82:00.0 s LINK_TYPE_P1=2 mstconfig -d 82:00.0 s LINK_TYPE_P2=2 mstfwreset -d 82:00.0 -l3 -y reset
now to setup the mapping of port to mac for the qe0 and qe1 interfaces
echo -e \ 'SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{dev_id}=="0x0", ATTR{type}=="1", ATTR{address}=="00:02:c9:37:bc:80", NAME="qe0"'"\n"\ 'SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{dev_id}=="0x0", ATTR{type}=="1", ATTR{address}=="00:02:c9:37:bc:81", NAME="qe1"' \ >/etc/udev/rules.d/70-persistent-net.rules
/etc/network/interfaces
Setup this based on the other servers
/etc/network/if-up.d/00-keekles.sh
copy this file to the server for others.
/etc/pve/localfirewall.sh
Copy this from the other servers. this probally is not needed as once it joins the cluster it should appear there.
domain names
Edit this in the dns to set it up
Fink.keekles.org 3600 IN A 23.149.104.16 Fink.keekles.org 3600 IN AAAA 2602:2af:0:1::a16
proxmox config
remove the nag
From https://johnscs.com/remove-proxmox51-subscription-notice/
sed -Ezi.bak "s/(function\(orig_cmd\) \{)/\1\n\torig_cmd\(\);\n\treturn;/g" /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js && systemctl restart pveproxy.service
Join Server to existing cluster
pvecm add 192.168.8.180
root@Fink:~# pvecm add 192.168.8.180 Please enter superuser (root) password for '192.168.8.180': ********** Establishing API connection with host '192.168.8.180' The authenticity of host '192.168.8.180' can't be established. X509 SHA256 key fingerprint is 53:FD:4B:EE:AC:7A:2C:10:60:05:71:58:99:45:26:EA:26:07:62:C0:6C:1B:46:F6:8A:DC:3D:32:99:E0:55:51. Are you sure you want to continue connecting (yes/no)? yes Login succeeded. check cluster join API version No cluster network links passed explicitly, fallback to local node IP '192.168.8.186' Request addition of this node Join request OK, finishing setup locally stopping pve-cluster service backup old database to '/var/lib/pve-cluster/backup/config-1724515764.sql.gz' waiting for quorum...OK (re)generate node files generate new node certificate merge authorized SSH keys generated new node certificate, restart pveproxy and pvedaemon services successfully added node 'Fink' to cluster.
local storage
localDataStore is the pool for this
make a zvol for this
zfs create localDataStore/proxmox
copy it over from the rpool
zfs snapshot rpool/var-lib-vz@xfer zfs send -vR rpool/var-lib-vz@xfer | zfs receive -Fdu localDataStore zfs umount rpool/var-lib-vz zfs destroy -r rpool/var-lib-vz zfs mount localDataStore/var-lib-vz zfs destroy -v localDataStore/var-lib-vz@xfer zfs set mountpoint=none rpool/data zfs set mountpoint=/localDataStore/proxmox localDataStore/proxmox
/etc/pve/storage.conf
Make a new class LDS-zfs and assign it only to the new node.
zfspool: local-zfs pool rpool/data blocksize 64K content rootdir,images sparse 1 nodes SpiderPig Moleman CarbonRod zfspool: LDS-zfs pool localDataStore/proxmox blocksize 32K content rootdir,images sparse 1 nodes Fink