Hypervisor 1RU 2024

From W9CR
Revision as of 14:08, 8 September 2024 by Bryan (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


This is the build about the hypervisor in 2024.

This is a 1 RU server with fast local SSD, aprox 18TB of storage.

Bios settings

A patched bios is needed for this to boot of UEFI and also enable PCI hotplug menu in the bios. I was able to edit this and have posted the latest version here.



Disk Layout

Front Looking at the front
Slot001 Slot003 Slot005 Slot007 Slot009
Slot000 Slot002 Slot004 Slot006 Slot008
Rear looking at rear
40g nic Upper-2 Upper-3
Lower-0 Lower-1

boot disk

Boot disk = 128g mirror ZFS 
  1              34            2047   1007.0 KiB  EF02
  2            2048         2099199   1024.0 MiB  EF00
  3         2099200       838860800    128.0 GiB  BF01

ZFS storage

/data

5 vdev's mirrored of the SAS disks 
2 mirror NVME 384g partition, but dedicated on the whole disk.  This can grow.
optional log and l2arc on the boot NVME's

partition the NVME

root@pve:~# gdisk /dev/nvme0n1
GPT fdisk (gdisk) version 1.0.9

Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present

Creating new GPT entries in memory.

Command (? for help): n
Partition number (1-128, default 1): 1
First sector (6-390624994, default = 256) or {+-}size{KMGTP}: 2048
Last sector (2048-390624994, default = 390624767) or {+-}size{KMGTP}: +384G
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): BF01
Changed type of partition to 'Solaris /usr & Mac ZFS'

Command (? for help): p
Disk /dev/nvme0n1: 390625000 sectors, 1.5 TiB
Model: MZ1LB1T9HBLS-000FB
Sector size (logical/physical): 4096/4096 bytes
Disk identifier (GUID): 5A039EE7-96B6-4D53-BEA4-56BCF48F4ABA
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 5
First usable sector is 6, last usable sector is 390624994
Partitions will be aligned on 256-sector boundaries
Total free space is 289961693 sectors (1.1 TiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048       100665343   384.0 GiB   BF01  Solaris /usr & Mac ZFS

Command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/nvme0n1.
The operation has completed successfully.

create the pool

Note that due to slot 8 and 9 being on their own 2 ports of 8 they will run faster than the rest and fill unequally if paired together.

zpool create -f -o ashift=12 -O compression=lz4 -O atime=off -O xattr=sa localDataStore \
mirror /dev/disk/by-enclosure-slot/front-slot000 /dev/disk/by-enclosure-slot/front-slot001 \
mirror /dev/disk/by-enclosure-slot/front-slot002 /dev/disk/by-enclosure-slot/front-slot003 \
mirror /dev/disk/by-enclosure-slot/front-slot004 /dev/disk/by-enclosure-slot/front-slot005 \
mirror /dev/disk/by-enclosure-slot/front-slot006 /dev/disk/by-enclosure-slot/front-slot008 \
mirror /dev/disk/by-enclosure-slot/front-slot007 /dev/disk/by-enclosure-slot/front-slot009 \
special mirror /dev/disk/by-enclosure-slot/nvme-upper-2-part1 /dev/disk/by-enclosure-slot/nvme-lower-0-part1

Backplane

The onboard backplane is a BPN-SAS3-116A-N2 which has 8 SAS disks and then 2 NVME or SAS. Howerver this last thing is not true if you want to run 10 SAS disks. The right most NVME/SAS ports are called "SAS2" ports on the backplane, but are really SATA ports, and connected to the onboard SATA. As this is a backplane only, not an expander each physical SAS port from the controller is connected to one SAS drive. Since the included controller only had 8 ports, a 16 port controller is used.


NVME name spaces

https://narasimhan-v.github.io/2020/06/12/Managing-NVMe-Namespaces.html

The NVME come setup as 1.88T disks:

tnvmcap   : 1,880,375,648,256
unvmcap   : 375,648,256

I suspect this is a 2.0 TiB (2,199,023,255,552b) provisioned down in the controller or about 85%. moving this to a 1.6TB disk will under provision this and make it perform better in the event we use it as log or write intensive.

Ensure we're on 4096 bytes

nvme id-ns -H /dev/nvme0n1 | grep "LBA Format"
nvme id-ns -H /dev/nvme1n1 | grep "LBA Format"
nvme id-ns -H /dev/nvme2n1 | grep "LBA Format"
nvme id-ns -H /dev/nvme3n1 | grep "LBA Format"

1 & 2 Detatch the name space

nvme detach-ns /dev/nvme1 -namespace-id=1 -controllers=4
nvme detach-ns /dev/nvme3 -namespace-id=1 -controllers=4
nvme detach-ns /dev/nvme2 -namespace-id=1 -controllers=4
nvme detach-ns /dev/nvme0 -namespace-id=1 -controllers=4

Delete the namespace

nvme delete-ns /dev/nvme1 -namespace-id=1
nvme delete-ns /dev/nvme3 -namespace-id=1
nvme delete-ns /dev/nvme2 -namespace-id=1
nvme delete-ns /dev/nvme0 -namespace-id=1

Make the new namespace

nvme create-ns /dev/nvme1 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0
nvme create-ns /dev/nvme3 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0
nvme create-ns /dev/nvme2 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0
nvme create-ns /dev/nvme0 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0

Attach the namespace to the controller

nvme attach-ns /dev/nvme1 --namespace-id=1 --controllers=4
nvme attach-ns /dev/nvme3 --namespace-id=1 --controllers=4
nvme attach-ns /dev/nvme2 --namespace-id=1 --controllers=4
nvme attach-ns /dev/nvme0 --namespace-id=1 --controllers=4

reset the controller to make it visable to the OS

nvme reset /dev/nvme1
nvme reset /dev/nvme3
nvme reset /dev/nvme2
nvme reset /dev/nvme0

Confirm it

nvme list
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme4n1          /dev/ng4n1            S64ANS0T515282K      Samsung SSD 980 1TB                      1         114.21  GB /   1.00  TB    512   B +  0 B   2B4QFXO7
/dev/nvme3n1          /dev/ng3n1            S5XANA0R537286       MZ1LB1T9HBLS-000FB                       1           0.00   B /   1.60  TB      4 KiB +  0 B   EDW73F2Q
/dev/nvme2n1          /dev/ng2n1            S5XANA0R694994       MZ1LB1T9HBLS-000FB                       1         157.62  GB /   1.88  TB      4 KiB +  0 B   EDW73F2Q
/dev/nvme1n1          /dev/ng1n1            S5XANA0R682634       MZ1LB1T9HBLS-000FB                       1         157.47  GB /   1.88  TB      4 KiB +  0 B   EDW73F2Q
/dev/nvme0n1          /dev/ng0n1            S5XANA0R682645       MZ1LB1T9HBLS-000FB                       1           0.00   B /   1.60  TB      4 KiB +  0 B   EDW73F2Q

udev rules

Look here for the info on setting this up

Software

apt-get install sg3-utils-udev sdparm ledmon lsscsi net-tools nvme-cli lldpd rsyslog ipmitool vim unzip git fio sudo locate screen snmpd libsnmp-dev mstflint

configs

Screen

echo -e "#Bryan Config for scroll back buffer\ntermcapinfo xterm|xterms|xs|rxvt ti@:te@" >>/etc/screenrc

Bash Completion

configure bash completion for interactive shells

vim /etc/bash.bashrc
uncomment the stuff below 
# enable bash completion in interactive shells 
#add in zfs completition
. /usr/share/bash-completion/completions/zfs

root profile

# You may uncomment the following lines if you want `ls' to be colorized:
export LS_OPTIONS='--color=auto'
eval "$(dircolors)"
alias ls='ls $LS_OPTIONS'

hostname

echo Fink >/etc/hostname
  • Edit the /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.8.186 Fink.keekles.org Fink
  • Reboot

/etc/hosts

This needs to be the same across the cluster

scp /etc/hosts root@fink:/etc/hosts

SSHD

Configure SSH to only allow root via PSK and users via PSK.

snmp

  • Install SNMP
sudo apt-get -y install snmp snmpd libsnmp-dev
  • Stop the snmpd service so we can add a user
sudo service snmpd stop
  • Add a SNMPv3 user:
sudo net-snmp-config --create-snmpv3-user -ro -A keeklesSNMPpasswd -a SHA -x AES -X keeklesSNMPpasswd KeeklesSNMP
  • copy the /etc/snmp/snmpd.conf from carbonrod
scp /etc/snmp/snmpd.conf root@fink:/etc/snmp/snmpd.conf


  • edit the /etc/snmp/snmpd.conf
vim /etc/snmp/snmpd.conf
Update the syslocation and such
  • restart snmpd
service snmpd start
  • Test SNMP
# TEST
snmpwalk -v3 -u KeeklesSNMP -l authPriv -a SHA -A keeklespasswd -x aes -X keeklespasswd 192.168.8.186

observium client

  • Do this from eyes
  export SERVER=<hostname>
  • on target server:
sudo apt-get install snmpd php perl curl xinetd snmp libsnmp-dev libwww-perl 
#only needed for postfix
apt-get -y install rrdtool mailgraph
dpkg-reconfigure mailgraph
  • install the current distro program on the target:
 sudo curl -o /usr/local/bin/distro https://gitlab.com/observium/distroscript/raw/master/distro
 sudo chmod +x /usr/local/bin/distro


  • From eyes
scp /opt/observium/scripts/observium_agent_xinetd $SERVER:/etc/xinetd.d/observium_agent_xinetd
scp /opt/observium/scripts/observium_agent $SERVER:/usr/bin/observium_agent
ssh $SERVER mkdir -p /usr/lib/observium_agent
ssh $SERVER mkdir -p /usr/lib/observium_agent/scripts-available
ssh $SERVER mkdir -p /usr/lib/observium_agent/scripts-enabled
scp /opt/observium/scripts/agent-local/* $SERVER:/usr/lib/observium_agent/scripts-available
  • on target enable the various allowed items
ln -s /usr/lib/observium_agent/scripts-available/os /usr/lib/observium_agent/scripts-enabled
   #ln -s /usr/lib/observium_agent/scripts-available/zimbra /usr/lib/observium_agent/scripts-enabled
ln -s /usr/lib/observium_agent/scripts-available/dpkg /usr/lib/observium_agent/scripts-enabled
ln -s /usr/lib/observium_agent/scripts-available/ntpd /usr/lib/observium_agent/scripts-enabled
ln -s /usr/lib/observium_agent/scripts-available/virt-what /usr/lib/observium_agent/scripts-enabled 
ln -s /usr/lib/observium_agent/scripts-available/proxmox-qemu /usr/lib/observium_agent/scripts-enabled
ln -s /usr/lib/observium_agent/scripts-available/postfix_mailgraph /usr/lib/observium_agent/scripts-enabled


  • Edit /etc/xinetd.d/observium_agent_xinetd so the Observium server is allowed to connect. You can do this by substituting 127.0.0.1, or place your IP after it, separated by a space. Make sure to restart xinetd afterwards so the configuration file is read.
sudo service xinetd restart
  • Test from eyes
telnet $SERVER 36602
snmpwalk -v3 -u KeeklesSNMP -l authNoPriv -a MD5 -A keeklespasswd $SERVER

default editor

update-alternatives --config editor 
Then select #3 vim.basic

timezone

sudo timedatectl set-timezone UTC

sudo config

add local user accounts

useradd -D -s /bin/bash -m -U -G sudo bryan
  • copy over ssh
rsync -avz /home/bryan/.ssh root@192.168.8.186:/home/bryan/

Configure sudo

echo "bryan ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers.d/00-admins

Configure Postfix

Postfix is installed to forward mail for root to a smtp host.

apt-get install postfix mailutils

This will run an installer with a curses interface and you must select Satallite System. Check the System mail name is the hostname of the server, and the SMTP relay host is morty.keekles.org. Root and postmaster mail should be rootmail@allstarlink.org.

Should you need to reconfigure this use:

dpkg-reconfigure postfix

other aliases are setup in /etc/aliases. You must run newaliases after this is updated for them to take effect.

Network

NIC

The MLX4 2x40g nic needs some setting in it's module to set to Ethernet and not infiniband mode.

echo "options mlx4_core port_type_array=2,2" >/etc/modprobe.d/mlx4.conf

You'll need the PCI ID for the next ones to set the port type to 2 Ethernet.

mstconfig -d 82:00.0 s LINK_TYPE_P1=2
mstconfig -d 82:00.0 s LINK_TYPE_P2=2
mstfwreset -d 82:00.0 -l3 -y reset

now to setup the mapping of port to mac for the qe0 and qe1 interfaces

echo -e \
'SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{dev_id}=="0x0", ATTR{type}=="1", ATTR{address}=="00:02:c9:37:bc:80", NAME="qe0"'"\n"\
'SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{dev_id}=="0x0", ATTR{type}=="1", ATTR{address}=="00:02:c9:37:bc:81", NAME="qe1"' \
 >/etc/udev/rules.d/70-persistent-net.rules

/etc/network/interfaces

Setup this based on the other servers

/etc/network/if-up.d/00-keekles.sh

copy this file to the server for others.

/etc/pve/localfirewall.sh

Copy this from the other servers. this probally is not needed as once it joins the cluster it should appear there.

domain names

Edit this in the dns to set it up

Fink.keekles.org        3600    IN      A       23.149.104.16
Fink.keekles.org        3600    IN      AAAA    2602:2af:0:1::a16

proxmox config

remove the nag

From https://johnscs.com/remove-proxmox51-subscription-notice/

sed -Ezi.bak "s/(function\(orig_cmd\) \{)/\1\n\torig_cmd\(\);\n\treturn;/g" /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js && systemctl restart pveproxy.service

Join Server to existing cluster

pvecm add 192.168.8.180
root@Fink:~# pvecm add 192.168.8.180
Please enter superuser (root) password for '192.168.8.180': **********
Establishing API connection with host '192.168.8.180'
The authenticity of host '192.168.8.180' can't be established.
X509 SHA256 key fingerprint is 53:FD:4B:EE:AC:7A:2C:10:60:05:71:58:99:45:26:EA:26:07:62:C0:6C:1B:46:F6:8A:DC:3D:32:99:E0:55:51.
Are you sure you want to continue connecting (yes/no)? yes
Login succeeded.
check cluster join API version
No cluster network links passed explicitly, fallback to local node IP '192.168.8.186'
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1724515764.sql.gz'
waiting for quorum...OK
(re)generate node files
generate new node certificate
merge authorized SSH keys
generated new node certificate, restart pveproxy and pvedaemon services
successfully added node 'Fink' to cluster.

local storage

localDataStore is the pool for this

make a zvol for this

zfs create localDataStore/proxmox

copy it over from the rpool

zfs snapshot rpool/var-lib-vz@xfer
zfs send -vR rpool/var-lib-vz@xfer | zfs receive -Fdu localDataStore
zfs umount rpool/var-lib-vz
zfs destroy -r rpool/var-lib-vz
zfs mount localDataStore/var-lib-vz
zfs destroy -v localDataStore/var-lib-vz@xfer
zfs set mountpoint=none rpool/data
zfs set mountpoint=/localDataStore/proxmox localDataStore/proxmox


/etc/pve/storage.conf

Make a new class LDS-zfs and assign it only to the new node.

zfspool: local-zfs
        pool rpool/data
        blocksize 64K
        content rootdir,images
        sparse 1
        nodes SpiderPig Moleman CarbonRod

zfspool: LDS-zfs
        pool localDataStore/proxmox
        blocksize 32K
        content rootdir,images
        sparse 1
        nodes Fink

Reference material

Samsung SSD 845DC 04 Over-provisioning

QNAP - SSD Over-provisioning White Paper

Innodisk SATADOM-SL Datasheet

https://medium.com/@reefland/over-provisioning-ssd-for-increased-performance-and-write-endurance-142feb015b4e