Hypervisor 1RU 2024
This is the build about the hypervisor in 2024.
This is a 1 RU server with fast local SSD, aprox 18TB of storage.
Bios settings
A patched bios is needed for this to boot of UEFI and also enable PCI hotplug menu in the bios. I was able to edit this and have posted the latest version here.
Disk Layout
Front Looking at the front | ||||
---|---|---|---|---|
Slot001 | Slot003 | Slot005 | Slot007 | Slot009 |
Slot000 | Slot002 | Slot004 | Slot006 | Slot008 |
Rear looking at rear | ||||
40g nic | Upper-2 | Upper-3 | ||
NA | Lower-0 | Lower-1 |
boot disk
Boot disk = 128g mirror ZFS 1 34 2047 1007.0 KiB EF02 2 2048 2099199 1024.0 MiB EF00 3 2099200 838860800 128.0 GiB BF01
ZFS storage
/data
5 vdev's mirrored of the SAS disks 2 mirror NVME 384g partition, but dedicated on the whole disk. This can grow. optional log and l2arc on the boot NVME's
partition the NVME
root@pve:~# gdisk /dev/nvme0n1 GPT fdisk (gdisk) version 1.0.9 Partition table scan: MBR: not present BSD: not present APM: not present GPT: not present Creating new GPT entries in memory. Command (? for help): n Partition number (1-128, default 1): 1 First sector (6-390624994, default = 256) or {+-}size{KMGTP}: 2048 Last sector (2048-390624994, default = 390624767) or {+-}size{KMGTP}: +384G Current type is 8300 (Linux filesystem) Hex code or GUID (L to show codes, Enter = 8300): BF01 Changed type of partition to 'Solaris /usr & Mac ZFS' Command (? for help): p Disk /dev/nvme0n1: 390625000 sectors, 1.5 TiB Model: MZ1LB1T9HBLS-000FB Sector size (logical/physical): 4096/4096 bytes Disk identifier (GUID): 5A039EE7-96B6-4D53-BEA4-56BCF48F4ABA Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 5 First usable sector is 6, last usable sector is 390624994 Partitions will be aligned on 256-sector boundaries Total free space is 289961693 sectors (1.1 TiB) Number Start (sector) End (sector) Size Code Name 1 2048 100665343 384.0 GiB BF01 Solaris /usr & Mac ZFS Command (? for help): w Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING PARTITIONS!! Do you want to proceed? (Y/N): y OK; writing new GUID partition table (GPT) to /dev/nvme0n1. The operation has completed successfully.
create the pool
Note that due to slot 8 and 9 being on their own 2 ports of 8 they will run faster than the rest and fill unequally if paired together.
zpool create -f -o ashift=12 -O compression=lz4 -O atime=off -O xattr=sa localDataStore \ mirror /dev/disk/by-enclosure-slot/front-slot000 /dev/disk/by-enclosure-slot/front-slot001 \ mirror /dev/disk/by-enclosure-slot/front-slot002 /dev/disk/by-enclosure-slot/front-slot003 \ mirror /dev/disk/by-enclosure-slot/front-slot004 /dev/disk/by-enclosure-slot/front-slot005 \ mirror /dev/disk/by-enclosure-slot/front-slot006 /dev/disk/by-enclosure-slot/front-slot008 \ mirror /dev/disk/by-enclosure-slot/front-slot007 /dev/disk/by-enclosure-slot/front-slot009 \ special mirror /dev/disk/by-enclosure-slot/nvme-upper-2-part1 /dev/disk/by-enclosure-slot/nvme-lower-0-part1
Backplane
The onboard backplane is a BPN-SAS3-116A-N2 which has 8 SAS disks and then 2 NVME or SAS. Howerver this last thing is not true if you want to run 10 SAS disks. The right most NVME/SAS ports are called "SAS2" ports on the backplane, but are really SATA ports, and connected to the onboard SATA. As this is a backplane only, not an expander each physical SAS port from the controller is connected to one SAS drive. Since the included controller only had 8 ports, a 16 port controller is used.
NVME name spaces
https://narasimhan-v.github.io/2020/06/12/Managing-NVMe-Namespaces.html
The NVME come setup as 1.88T disks:
tnvmcap : 1,880,375,648,256 unvmcap : 375,648,256
I suspect this is a 2.0 TiB (2,199,023,255,552b) provisioned down in the controller or about 85%. moving this to a 1.6TB disk will under provision this and make it perform better in the event we use it as log or write intensive.
Ensure we're on 4096 bytes
nvme id-ns -H /dev/nvme0n1 | grep "LBA Format" nvme id-ns -H /dev/nvme1n1 | grep "LBA Format" nvme id-ns -H /dev/nvme2n1 | grep "LBA Format" nvme id-ns -H /dev/nvme3n1 | grep "LBA Format"
1 & 2 Detatch the name space
nvme detach-ns /dev/nvme1 -namespace-id=1 -controllers=4 nvme detach-ns /dev/nvme3 -namespace-id=1 -controllers=4 nvme detach-ns /dev/nvme2 -namespace-id=1 -controllers=4 nvme detach-ns /dev/nvme0 -namespace-id=1 -controllers=4
Delete the namespace
nvme delete-ns /dev/nvme1 -namespace-id=1 nvme delete-ns /dev/nvme3 -namespace-id=1 nvme delete-ns /dev/nvme2 -namespace-id=1 nvme delete-ns /dev/nvme0 -namespace-id=1
Make the new namespace
nvme create-ns /dev/nvme1 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0 nvme create-ns /dev/nvme3 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0 nvme create-ns /dev/nvme2 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0 nvme create-ns /dev/nvme0 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0
Attach the namespace to the controller
nvme attach-ns /dev/nvme1 --namespace-id=1 --controllers=4 nvme attach-ns /dev/nvme3 --namespace-id=1 --controllers=4 nvme attach-ns /dev/nvme2 --namespace-id=1 --controllers=4 nvme attach-ns /dev/nvme0 --namespace-id=1 --controllers=4
reset the controller to make it visable to the OS
nvme reset /dev/nvme1 nvme reset /dev/nvme3 nvme reset /dev/nvme2 nvme reset /dev/nvme0
Confirm it
nvme list Node Generic SN Model Namespace Usage Format FW Rev --------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme4n1 /dev/ng4n1 S64ANS0T515282K Samsung SSD 980 1TB 1 114.21 GB / 1.00 TB 512 B + 0 B 2B4QFXO7 /dev/nvme3n1 /dev/ng3n1 S5XANA0R537286 MZ1LB1T9HBLS-000FB 1 0.00 B / 1.60 TB 4 KiB + 0 B EDW73F2Q /dev/nvme2n1 /dev/ng2n1 S5XANA0R694994 MZ1LB1T9HBLS-000FB 1 157.62 GB / 1.88 TB 4 KiB + 0 B EDW73F2Q /dev/nvme1n1 /dev/ng1n1 S5XANA0R682634 MZ1LB1T9HBLS-000FB 1 157.47 GB / 1.88 TB 4 KiB + 0 B EDW73F2Q /dev/nvme0n1 /dev/ng0n1 S5XANA0R682645 MZ1LB1T9HBLS-000FB 1 0.00 B / 1.60 TB 4 KiB + 0 B EDW73F2Q
udev rules
Look here for the info on setting this up
Software
apt-get install sg3-utils-udev sdparm ledmon lsscsi net-tools nvme-cli lldpd rsyslog ipmitool vim unzip git fio sudo locate screen
configs
Screen
echo -e "#Bryan Config for scroll back buffer\ntermcapinfo xterm|xterms|xs|rxvt ti@:te@" >>/etc/screenrc
Bash Completion
configure bash completion for interactive shells
vim /etc/bash.bashrc uncomment the stuff below # enable bash completion in interactive shells #add in zfs completition . /usr/share/bash-completion/completions/zfs
root profile
# You may uncomment the following lines if you want `ls' to be colorized: export LS_OPTIONS='--color=auto' eval "$(dircolors)" alias ls='ls $LS_OPTIONS'
Network
NIC
The MLX4 2x40g nic needs some setting in it's module to set to ethernet and not infaband mode.
echo "options mlx4_core port_type_array=2,2" >/etc/modprobe.d/mlx4.conf
now to setup the mapping of port to mac for the qe0 and qe1 interfaces
echo -e \ 'SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{dev_id}=="0x0", ATTR{type}=="1", ATTR{address}=="00:02:c9:37:bc:80", NAME="qe0"'"\n"\ 'SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{dev_id}=="0x0", ATTR{type}=="1", ATTR{address}=="00:02:c9:37:bc:81", NAME="qe1"' \ >/etc/udev/rules.d/70-persistent-net.rules
proxmox config
remove the nag
From https://johnscs.com/remove-proxmox51-subscription-notice/
sed -Ezi.bak "s/(function\(orig_cmd\) \{)/\1\n\torig_cmd\(\);\n\treturn;/g" /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js && systemctl restart pveproxy.service
local storage
localDataStore is the pool for this
make a zvol for this
zfs create localDataStore/proxmox
copy it over from the rpool
zfs snapshot rpool/var-lib-vz@xfer zfs send -vR rpool/var-lib-vz@xfer | zfs receive -Fdu localDataStore zfs umount rpool/var-lib-vz zfs destroy -r rpool/var-lib-vz zfs mount localDataStore/var-lib-vz zfs destroy -v localDataStore/var-lib-vz@xfer
edit /etc/pve/storage.cfg
dir: local path /var/lib/vz content iso,vztmpl,backup zfspool: local-zfs pool localDataStore/proxmox sparse content images,rootdir