Difference between revisions of "ZFS"
Line 34: | Line 34: | ||
==Optimization== | ==Optimization== | ||
+ | |||
All disks should be updated | All disks should be updated | ||
Line 76: | Line 77: | ||
https://github.com/openzfs/zfs/discussions/12769 | https://github.com/openzfs/zfs/discussions/12769 | ||
+ | |||
+ | = Disk Notes = | ||
+ | |||
+ | I've run into some issues seeing my disk size "MAX LBA" be different even after formatting | ||
+ | |||
+ | 0:17:0 SEAGATE ST12000NM0027 E004 Disk 10.91 TB 50:00:C5:00:A6:F0:A0:79 sdn sg15 | ||
+ | SN:ZJV2GV4B0000C908373F | ||
+ | 0:18:0 SEAGATE ST12000NM0027 E004 Disk 10.84 TB 50:00:C5:00:A6:F0:CC:89 sdj sg11 | ||
+ | SN:ZJV2GTFT0000C9069US9 | ||
+ | |||
+ | This is strange the size is different. I ran the info command on these | ||
+ | |||
+ | SeaChest_Basics_x86_64-redhat-linux -d /dev/sg11 -i | ||
+ | SeaChest_Basics_x86_64-redhat-linux -d /dev/sg15 -i | ||
+ | |||
+ | This shows the following different options | ||
+ | Drive Capacity (TB/TiB): 12.00/10.91 | ||
+ | Drive Capacity (TB/TiB): 11.92/10.84 | ||
+ | |||
+ | Protection Type 2 | ||
+ | Protection Type 2 [Enabled] | ||
+ | |||
+ | Informational Exceptions [Mode 4] | ||
+ | Informational Exceptions [Mode 0] | ||
+ | |||
+ | sg_readcap -l /dev/sg15 | ||
+ | Read Capacity results: | ||
+ | Protection: prot_en=0, p_type=0, p_i_exponent=0 | ||
+ | Logical block provisioning: lbpme=0, lbprz=0 | ||
+ | Last LBA=2929721343 (0xae9fffff), Number of logical blocks=2929721344 | ||
+ | Logical block length=4096 bytes | ||
+ | Logical blocks per physical block exponent=0 | ||
+ | Lowest aligned LBA=0 | ||
+ | Hence: | ||
+ | Device size: 12000138625024 bytes, 11444224.0 MiB, 12000.14 GB, 12.00 TB | ||
+ | |||
+ | # sg_readcap -l /dev/sg11 | ||
+ | Read Capacity results: | ||
+ | Protection: prot_en=1, p_type=1, p_i_exponent=0 [type 2 protection] | ||
+ | Logical block provisioning: lbpme=0, lbprz=0 | ||
+ | Last LBA=2909274111 (0xad67ffff), Number of logical blocks=2909274112 | ||
+ | Logical block length=4096 bytes | ||
+ | Logical blocks per physical block exponent=0 | ||
+ | Lowest aligned LBA=0 | ||
+ | Hence: | ||
+ | Device size: 11916386762752 bytes, 11364352.0 MiB, 11916.39 GB, 11.92 TB | ||
+ | |||
+ | Thus the smaller drive has something called Protection Type 2 enabled. I had no idea what this is. Some searching turned up [http://talesinit.blogspot.com/2015/11/formatted-with-type-2-protection-huh.html this website] | ||
+ | <blockquote> | ||
+ | Not knowing what this was, I then went down a seemingly never ending spiral of [http://www.seagate.com/files/staticfiles/docs/pdf/whitepaper/safeguarding-data-from-corruption-technology-paper-tp621us.pdf T10 Protection Information [PDF<nowiki>]</nowiki>] standards. Its pretty neat, how I understand it is the disk controller formats the platters to 520 byte sectors, instead of the more traditional 512 byte sectors, these 8 extra bytes per sector are there for the controller to make sure that the data written to that sector is the same data that is read from it, sort of like data verification. The disk controller can then presents the system (HBA controller or raid card) with the normal 512 bytes of data per section, and any SCSI compatible controller should be able to read and write to it just fine. | ||
+ | </blockquote> | ||
+ | |||
+ | Essentially this is a hold over from the 520 byte sectors and 8 bytes being used for a checksum/parity. This isn't needed in ZFS. It wastes 79.872 GiBytes of space from the disk too! |
Revision as of 14:26, 1 June 2023
Notes on ZFS
Contents
Home setup
On osx I'm running a bunch of 12tb disks in a raidz2 config. My intent is to migrate to a zpool with special devices in it.
Plan is 20 12tb disks in 2 vdev's of raidz2 with 3.2 TB SSD's in a mirror. I'll use the m2 SSD on the server for ZIL and l2arc.
This should give about 174.56 TiB of space.
Block Size Histogram block psize lsize asize size Count Size Cum. Count Size Cum. Count Size Cum. 512: 350K 175M 175M 350K 175M 175M 0 0 0 1K: 348K 413M 589M 348K 413M 589M 0 0 0 2K: 273K 722M 1.28G 273K 722M 1.28G 0 0 0 4K: 669K 2.65G 3.93G 221K 1.17G 2.45G 0 0 0 8K: 925K 8.50G 12.4G 176K 1.91G 4.36G 1.23M 14.7G 14.7G 16K: 620M 9.69T 9.70T 621M 9.70T 9.70T 621M 14.6T 14.6T 32K: 1.39M 62.8G 9.76T 82.2K 3.57G 9.70T 410K 19.0G 14.6T 64K: 548K 47.3G 9.81T 47.2K 4.06G 9.71T 1.58M 153G 14.7T 128K: 825K 150G 9.95T 1014K 128G 9.83T 699K 133G 14.9T 256K: 66.3M 16.6T 26.5T 68.4M 17.1T 26.9T 66.6M 20.3T 35.1T 512K: 0 0 26.5T 0 0 26.9T 0 0 35.1T 1M: 0 0 26.5T 0 0 26.9T 0 0 35.1T 2M: 0 0 26.5T 0 0 26.9T 0 0 35.1T 4M: 0 0 26.5T 0 0 26.9T 0 0 35.1T 8M: 0 0 26.5T 0 0 26.9T 0 0 35.1T 16M: 0 0 26.5T 0 0 26.9T 0 0 35.1T
Optimization
All disks should be updated
./SeaChest_Firmware_x86_64-redhat-linux --downloadFW /root/MobulaExosX12SAS-STD-5xxE-E004.LOD -d /dev/sg7
All disks should be 4k sectors. The spinning disks should be long formatted to detect bad blocks.
./SeaChest_Lite_x86_64-redhat-linux --setSectorSize 4096 --confirm this-will-erase-data -d /dev/sg8
Write cache should be enabled:
# sdparm --get=WCE /dev/sg5 /dev/sg5: SEAGATE ST12000NM0027 E004 WCE 0 [cha: y, def: 1, sav: 0] # sdparm --set=WCE --save /dev/sg5 | /dev/sg5: SEAGATE ST12000NM0027 E004 # sdparm --get=WCE --save /dev/sg5 /dev/sg5: SEAGATE ST12000NM0027 E004 WCE 1 [cha: y, def: 1, sav: 1]
ashift= 13 = 8192 byte per IO. recordsize 256K compression lz4 casesensitivity insensitive special_small_blocks 128K zdb -Lbbb PoolName zpool create -f -o ashift=12 -O casesensitivity=insensitive -O normalization=formD -O compression=lz4 -O atime=off -O recordsize=256k ZfsMediaPool \ raidz2 /var/run/disk/by-path/PCI0@0-SAT0@17-PRT5@5-PMP@0-@0:0 /var/run/disk/by-path/PCI0@0-SAT0@17-PRT4@4-PMP@0-@0:0 \ /var/run/disk/by-path/PCI0@0-RP21@1B,4-PXSX@0-PRT31@1f-PMP@0-@0:0 /var/run/disk/by-path/PCI0@0-SAT0@17-PRT3@3-PMP@0-@0:0 \ /var/run/disk/by-path/PCI0@0-SAT0@17-PRT2@2-PMP@0-@0:0 /var/run/disk/by-path/PCI0@0-SAT0@17-PRT1@1-PMP@0-@0:0 \ /var/run/disk/by-path/PCI0@0-SAT0@17-PRT0@0-PMP@0-@0:0 /var/run/disk/by-path/PCI0@0-RP21@1B,4-PXSX@0-PRT2@2-PMP@0-@0:0 \ /var/run/disk/by-path/PCI0@0-RP21@1B,4-PXSX@0-PRT3@3-PMP@0-@0:0 /var/run/disk/by-path/PCI0@0-RP21@1B,4-PXSX@0-PRT28@1c-PMP@0-@0:0 \ /var/run/disk/by-path/PCI0@0-RP21@1B,4-PXSX@0-PRT4@4-PMP@0-@0:0 /var/run/disk/by-path/PCI0@0-RP21@1B,4-PXSX@0-PRT29@1d-PMP@0-@0:0 zpool add ZfsMediaPool log /dev/disk5s3 zpool add ZfsMediaPool cache /dev/disk5s4
=
https://github.com/openzfs/zfs/discussions/12769
Disk Notes
I've run into some issues seeing my disk size "MAX LBA" be different even after formatting
0:17:0 SEAGATE ST12000NM0027 E004 Disk 10.91 TB 50:00:C5:00:A6:F0:A0:79 sdn sg15 SN:ZJV2GV4B0000C908373F 0:18:0 SEAGATE ST12000NM0027 E004 Disk 10.84 TB 50:00:C5:00:A6:F0:CC:89 sdj sg11 SN:ZJV2GTFT0000C9069US9
This is strange the size is different. I ran the info command on these
SeaChest_Basics_x86_64-redhat-linux -d /dev/sg11 -i SeaChest_Basics_x86_64-redhat-linux -d /dev/sg15 -i
This shows the following different options
Drive Capacity (TB/TiB): 12.00/10.91 Drive Capacity (TB/TiB): 11.92/10.84 Protection Type 2 Protection Type 2 [Enabled] Informational Exceptions [Mode 4] Informational Exceptions [Mode 0]
sg_readcap -l /dev/sg15 Read Capacity results: Protection: prot_en=0, p_type=0, p_i_exponent=0 Logical block provisioning: lbpme=0, lbprz=0 Last LBA=2929721343 (0xae9fffff), Number of logical blocks=2929721344 Logical block length=4096 bytes Logical blocks per physical block exponent=0 Lowest aligned LBA=0 Hence: Device size: 12000138625024 bytes, 11444224.0 MiB, 12000.14 GB, 12.00 TB
# sg_readcap -l /dev/sg11 Read Capacity results: Protection: prot_en=1, p_type=1, p_i_exponent=0 [type 2 protection] Logical block provisioning: lbpme=0, lbprz=0 Last LBA=2909274111 (0xad67ffff), Number of logical blocks=2909274112 Logical block length=4096 bytes Logical blocks per physical block exponent=0 Lowest aligned LBA=0 Hence: Device size: 11916386762752 bytes, 11364352.0 MiB, 11916.39 GB, 11.92 TB
Thus the smaller drive has something called Protection Type 2 enabled. I had no idea what this is. Some searching turned up this website
Not knowing what this was, I then went down a seemingly never ending spiral of T10 Protection Information [PDF] standards. Its pretty neat, how I understand it is the disk controller formats the platters to 520 byte sectors, instead of the more traditional 512 byte sectors, these 8 extra bytes per sector are there for the controller to make sure that the data written to that sector is the same data that is read from it, sort of like data verification. The disk controller can then presents the system (HBA controller or raid card) with the normal 512 bytes of data per section, and any SCSI compatible controller should be able to read and write to it just fine.
Essentially this is a hold over from the 520 byte sectors and 8 bytes being used for a checksum/parity. This isn't needed in ZFS. It wastes 79.872 GiBytes of space from the disk too!