Hypervisor 1RU 2024

From W9CR
Revision as of 12:03, 21 August 2024 by Bryan (talk | contribs) (→‎Software)
Jump to navigation Jump to search

This is the build about the hypervisor in 2024.

This is a 1 RU server with fast local SSD, aprox 18TB of storage.

Bios settings

A patched bios is needed for this to boot of UEFI and also enable PCI hotplug menu in the bios. I was able to edit this and have posted the latest version here.



Disk Layout

boot disk

Boot disk = 128g mirror ZFS 
  1              34            2047   1007.0 KiB  EF02
  2            2048         2099199   1024.0 MiB  EF00
  3         2099200       838860800    128.0 GiB  BF01

ZFS storage

/data

5 vdev's mirrored of the SAS disks 
2 mirror NVME 384g partition, but dedicated on the whole disk.  This can grow.
optional log and l2arc on the boot NVME's

partition the NVME

root@pve:~# gdisk /dev/nvme0n1
GPT fdisk (gdisk) version 1.0.9

Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present

Creating new GPT entries in memory.

Command (? for help): n
Partition number (1-128, default 1): 1
First sector (6-390624994, default = 256) or {+-}size{KMGTP}: 2048
Last sector (2048-390624994, default = 390624767) or {+-}size{KMGTP}: +384G
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): BF01
Changed type of partition to 'Solaris /usr & Mac ZFS'

Command (? for help): p
Disk /dev/nvme0n1: 390625000 sectors, 1.5 TiB
Model: MZ1LB1T9HBLS-000FB
Sector size (logical/physical): 4096/4096 bytes
Disk identifier (GUID): 5A039EE7-96B6-4D53-BEA4-56BCF48F4ABA
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 5
First usable sector is 6, last usable sector is 390624994
Partitions will be aligned on 256-sector boundaries
Total free space is 289961693 sectors (1.1 TiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048       100665343   384.0 GiB   BF01  Solaris /usr & Mac ZFS

Command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/nvme0n1.
The operation has completed successfully.

create the pool

Note that due to slot 8 and 9 being on their own 2 ports of 8 they will run faster than the rest and fill unequally if paired together.

zpool create -f -o ashift=12 -O compression=lz4 -O atime=off -O xattr=sa localDataStore \
mirror /dev/disk/by-enclosure-slot/front-slot000 /dev/disk/by-enclosure-slot/front-slot001 \
mirror /dev/disk/by-enclosure-slot/front-slot002 /dev/disk/by-enclosure-slot/front-slot003 \
mirror /dev/disk/by-enclosure-slot/front-slot004 /dev/disk/by-enclosure-slot/front-slot005 \
mirror /dev/disk/by-enclosure-slot/front-slot006 /dev/disk/by-enclosure-slot/front-slot008 \
mirror /dev/disk/by-enclosure-slot/front-slot007 /dev/disk/by-enclosure-slot/front-slot009 \
special mirror /dev/disk/by-enclosure-slot/nvme-upper-2-part1 /dev/disk/by-enclosure-slot/nvme-lower-0-part1

Backplane

The onboard backplane is a BPN-SAS3-116A-N2 which has 8 SAS disks and then 2 NVME or SAS. Howerver this last thing is not true if you want to run 10 SAS disks. The right most NVME/SAS ports are called "SAS2" ports on the backplane, but are really SATA ports, and connected to the onboard SATA. As this is a backplane only, not an expander each physical SAS port from the controller is connected to one SAS drive. Since the included controller only had 8 ports, a 16 port controller is used.


NVME name spaces

https://narasimhan-v.github.io/2020/06/12/Managing-NVMe-Namespaces.html

The NVME come setup as 1.88T disks:

tnvmcap   : 1,880,375,648,256
unvmcap   : 375,648,256

I suspect this is a 2.0 TiB (2,199,023,255,552b) provisioned down in the controller or about 85%. moving this to a 1.6TB disk will under provision this and make it perform better in the event we use it as log or write intensive.

Ensure we're on 4096 bytes

nvme id-ns -H /dev/nvme0n1 | grep "LBA Format"
nvme id-ns -H /dev/nvme1n1 | grep "LBA Format"
nvme id-ns -H /dev/nvme2n1 | grep "LBA Format"
nvme id-ns -H /dev/nvme3n1 | grep "LBA Format"

1 & 2 Detatch the name space

nvme detach-ns /dev/nvme1 -namespace-id=1 -controllers=4
nvme detach-ns /dev/nvme3 -namespace-id=1 -controllers=4
nvme detach-ns /dev/nvme2 -namespace-id=1 -controllers=4
nvme detach-ns /dev/nvme0 -namespace-id=1 -controllers=4

Delete the namespace

nvme delete-ns /dev/nvme1 -namespace-id=1
nvme delete-ns /dev/nvme3 -namespace-id=1
nvme delete-ns /dev/nvme2 -namespace-id=1
nvme delete-ns /dev/nvme0 -namespace-id=1

Make the new namespace

nvme create-ns /dev/nvme1 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0
nvme create-ns /dev/nvme3 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0
nvme create-ns /dev/nvme2 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0
nvme create-ns /dev/nvme0 --nsze-si=1.6T --ncap-si=1.6T --flbas=0 --dps=0 --nmic=0

Attach the namespace to the controller

nvme attach-ns /dev/nvme1 --namespace-id=1 --controllers=4
nvme attach-ns /dev/nvme3 --namespace-id=1 --controllers=4
nvme attach-ns /dev/nvme2 --namespace-id=1 --controllers=4
nvme attach-ns /dev/nvme0 --namespace-id=1 --controllers=4

reset the controller to make it visable to the OS

nvme reset /dev/nvme1
nvme reset /dev/nvme3
nvme reset /dev/nvme2
nvme reset /dev/nvme0

Confirm it

nvme list
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme4n1          /dev/ng4n1            S64ANS0T515282K      Samsung SSD 980 1TB                      1         114.21  GB /   1.00  TB    512   B +  0 B   2B4QFXO7
/dev/nvme3n1          /dev/ng3n1            S5XANA0R537286       MZ1LB1T9HBLS-000FB                       1           0.00   B /   1.60  TB      4 KiB +  0 B   EDW73F2Q
/dev/nvme2n1          /dev/ng2n1            S5XANA0R694994       MZ1LB1T9HBLS-000FB                       1         157.62  GB /   1.88  TB      4 KiB +  0 B   EDW73F2Q
/dev/nvme1n1          /dev/ng1n1            S5XANA0R682634       MZ1LB1T9HBLS-000FB                       1         157.47  GB /   1.88  TB      4 KiB +  0 B   EDW73F2Q
/dev/nvme0n1          /dev/ng0n1            S5XANA0R682645       MZ1LB1T9HBLS-000FB                       1           0.00   B /   1.60  TB      4 KiB +  0 B   EDW73F2Q

udev rules

Look here for the info on setting this up

Software

apt-get install sg3-utils-udev sdparm ledmon lsscsi net-tools nvme-cli lldpd rsyslog ipmitool vim unzip git fio sudo locate screen

configs

Screen

echo -e "#Bryan Config for scroll back buffer\ntermcapinfo xterm|xterms|xs|rxvt ti@:te@" >>/etc/screenrc

Bash Completion

configure bash completion for interactive shells

vim /etc/bash.bashrc
uncomment the stuff below 
# enable bash completion in interactive shells 

root profile

# You may uncomment the following lines if you want `ls' to be colorized:
export LS_OPTIONS='--color=auto'
eval "$(dircolors)"
alias ls='ls $LS_OPTIONS'

proxmox config

local storage

localDataStore is the pool for this

make a zvol for this

zfs create localDataStore/proxmox

copy it over from the rpool

zfs snapshot rpool/var-lib-vz@xfer
zfs send -vR rpool/var-lib-vz@xfer | zfs receive -Fdu localDataStore
zfs umount rpool/var-lib-vz
zfs destroy -r rpool/var-lib-vz
zfs mount localDataStore/var-lib-vz
zfs destroy -v localDataStore/var-lib-vz@xfer

edit /etc/pve/storage.cfg

dir: local
        path /var/lib/vz
        content iso,vztmpl,backup

zfspool: local-zfs
        pool localDataStore/proxmox
        sparse
        content images,rootdir

Reference material

Samsung SSD 845DC 04 Over-provisioning

QNAP - SSD Over-provisioning White Paper

Innodisk SATADOM-SL Datasheet

https://medium.com/@reefland/over-provisioning-ssd-for-increased-performance-and-write-endurance-142feb015b4e