Difference between revisions of "ZFS"
(10 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category:Keekles Infrastructure]] | ||
+ | |||
Notes on ZFS | Notes on ZFS | ||
Line 9: | Line 11: | ||
This should give about 174.56 TiB of space. | This should give about 174.56 TiB of space. | ||
+ | |||
+ | <pre> | ||
+ | Block Size Histogram | ||
+ | |||
+ | block psize lsize asize | ||
+ | size Count Size Cum. Count Size Cum. Count Size Cum. | ||
+ | 512: 350K 175M 175M 350K 175M 175M 0 0 0 | ||
+ | 1K: 348K 413M 589M 348K 413M 589M 0 0 0 | ||
+ | 2K: 273K 722M 1.28G 273K 722M 1.28G 0 0 0 | ||
+ | 4K: 669K 2.65G 3.93G 221K 1.17G 2.45G 0 0 0 | ||
+ | 8K: 925K 8.50G 12.4G 176K 1.91G 4.36G 1.23M 14.7G 14.7G | ||
+ | 16K: 620M 9.69T 9.70T 621M 9.70T 9.70T 621M 14.6T 14.6T | ||
+ | 32K: 1.39M 62.8G 9.76T 82.2K 3.57G 9.70T 410K 19.0G 14.6T | ||
+ | 64K: 548K 47.3G 9.81T 47.2K 4.06G 9.71T 1.58M 153G 14.7T | ||
+ | 128K: 825K 150G 9.95T 1014K 128G 9.83T 699K 133G 14.9T | ||
+ | 256K: 66.3M 16.6T 26.5T 68.4M 17.1T 26.9T 66.6M 20.3T 35.1T | ||
+ | 512K: 0 0 26.5T 0 0 26.9T 0 0 35.1T | ||
+ | 1M: 0 0 26.5T 0 0 26.9T 0 0 35.1T | ||
+ | 2M: 0 0 26.5T 0 0 26.9T 0 0 35.1T | ||
+ | 4M: 0 0 26.5T 0 0 26.9T 0 0 35.1T | ||
+ | 8M: 0 0 26.5T 0 0 26.9T 0 0 35.1T | ||
+ | 16M: 0 0 26.5T 0 0 26.9T 0 0 35.1T | ||
+ | </pre> | ||
+ | |||
+ | = Things to do = | ||
+ | |||
+ | # Set Write Cache | ||
+ | # set MRIE = 4 | ||
+ | # ll format | ||
+ | # record address and hours. | ||
+ | |||
+ | One liner to do this | ||
+ | for i in `seq 2 25` ; do SG="/dev/sg$i" ; echo $SG ;sdparm --set=WCE --save $SG ; sdparm --get=WCE $SG; sdparm --set=MRIE=4 --save $SG; sdparm --get=MRIE $SG; done | ||
==Optimization== | ==Optimization== | ||
+ | |||
All disks should be updated | All disks should be updated | ||
Line 37: | Line 73: | ||
zdb -Lbbb PoolName | zdb -Lbbb PoolName | ||
− | + | ||
− | zpool create -f -o ashift= | + | zpool create -f -o ashift=13 -O normalization=formD \ |
− | raidz2 / | + | -O compression=lz4 -O atime=off -O recordsize=256k -O special_small_blocks=128k BigPool \ |
− | + | raidz2 /dev/sdk /dev/sdu /dev/sdo /dev/sdq /dev/sds /dev/sdb /dev/sdc /dev/sdd \ | |
− | / | + | raidz2 /dev/sdl /dev/sdm /dev/sdn /dev/sde /dev/sdp /dev/sdf /dev/sdg /dev/sdi \ |
− | + | special mirror /dev/sdt /dev/sdw /dev/sdv \ | |
− | / | + | spare /dev/sdh /dev/sdx /dev/sdr /dev/sdj |
− | / | + | |
+ | zpool add ZfsMediaPool log /dev/disk5s3 | ||
+ | zpool add ZfsMediaPool cache /dev/disk5s4 | ||
=== | === | ||
https://github.com/openzfs/zfs/discussions/12769 | https://github.com/openzfs/zfs/discussions/12769 | ||
+ | |||
+ | = Disk Notes = | ||
+ | |||
+ | I've run into some issues seeing my disk size "MAX LBA" be different even after formatting | ||
+ | |||
+ | 0:17:0 SEAGATE ST12000NM0027 E004 Disk 10.91 TB 50:00:C5:00:A6:F0:A0:79 sdn sg15 | ||
+ | SN:ZJV2GV4B0000C908373F | ||
+ | 0:18:0 SEAGATE ST12000NM0027 E004 Disk 10.84 TB 50:00:C5:00:A6:F0:CC:89 sdj sg11 | ||
+ | SN:ZJV2GTFT0000C9069US9 | ||
+ | |||
+ | This is strange the size is different. I ran the info command on these | ||
+ | |||
+ | SeaChest_Basics_x86_64-redhat-linux -d /dev/sg11 -i | ||
+ | SeaChest_Basics_x86_64-redhat-linux -d /dev/sg15 -i | ||
+ | |||
+ | This shows the following different options | ||
+ | Drive Capacity (TB/TiB): 12.00/10.91 | ||
+ | Drive Capacity (TB/TiB): 11.92/10.84 | ||
+ | |||
+ | Protection Type 2 | ||
+ | Protection Type 2 [Enabled] | ||
+ | |||
+ | Informational Exceptions [Mode 4] | ||
+ | Informational Exceptions [Mode 0] | ||
+ | |||
+ | sg_readcap -l /dev/sg15 | ||
+ | Read Capacity results: | ||
+ | Protection: prot_en=0, p_type=0, p_i_exponent=0 | ||
+ | Logical block provisioning: lbpme=0, lbprz=0 | ||
+ | Last LBA=2929721343 (0xae9fffff), Number of logical blocks=2929721344 | ||
+ | Logical block length=4096 bytes | ||
+ | Logical blocks per physical block exponent=0 | ||
+ | Lowest aligned LBA=0 | ||
+ | Hence: | ||
+ | Device size: 12000138625024 bytes, 11444224.0 MiB, 12000.14 GB, 12.00 TB | ||
+ | |||
+ | # sg_readcap -l /dev/sg11 | ||
+ | Read Capacity results: | ||
+ | Protection: prot_en=1, p_type=1, p_i_exponent=0 [type 2 protection] | ||
+ | Logical block provisioning: lbpme=0, lbprz=0 | ||
+ | Last LBA=2909274111 (0xad67ffff), Number of logical blocks=2909274112 | ||
+ | Logical block length=4096 bytes | ||
+ | Logical blocks per physical block exponent=0 | ||
+ | Lowest aligned LBA=0 | ||
+ | Hence: | ||
+ | Device size: 11916386762752 bytes, 11364352.0 MiB, 11916.39 GB, 11.92 TB | ||
+ | |||
+ | Thus the smaller drive has something called Protection Type 2 enabled. I had no idea what this is. Some searching turned up [http://talesinit.blogspot.com/2015/11/formatted-with-type-2-protection-huh.html this website] | ||
+ | <blockquote> | ||
+ | Not knowing what this was, I then went down a seemingly never ending spiral of [http://www.seagate.com/files/staticfiles/docs/pdf/whitepaper/safeguarding-data-from-corruption-technology-paper-tp621us.pdf T10 Protection Information [PDF<nowiki>]</nowiki>] standards. Its pretty neat, how I understand it is the disk controller formats the platters to 520 byte sectors, instead of the more traditional 512 byte sectors, these 8 extra bytes per sector are there for the controller to make sure that the data written to that sector is the same data that is read from it, sort of like data verification. The disk controller can then presents the system (HBA controller or raid card) with the normal 512 bytes of data per section, and any SCSI compatible controller should be able to read and write to it just fine. | ||
+ | </blockquote> | ||
+ | |||
+ | Essentially this is a hold over from the 520 byte sectors and 8 bytes being used for a checksum/parity. This isn't needed in ZFS. It wastes 79.872 GiBytes of space from the disk too! | ||
+ | |||
+ | Note: I'm not sure this is due to the waste of space on this disk. It could be something the OEM does different for them as they were original 520b disks. | ||
+ | |||
+ | I tried | ||
+ | # SeaChest_Basics_x86_64-redhat-linux -d /dev/sg11 --restoreMaxLBA | ||
+ | And it didn't increase the LBA. | ||
+ | |||
+ | So lets look at the supported formats for this: | ||
+ | |||
+ | # ./SeaChest_Format_x86_64-redhat-linux -d /dev/sg11 --showSupportedFormats | ||
+ | ========================================================================================== | ||
+ | SeaChest_Format - Seagate drive utilities - NVMe Enabled | ||
+ | Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved | ||
+ | SeaChest_Format Version: 2.3.1-2_2_3 X86_64 | ||
+ | Build Date: Jun 17 2021 | ||
+ | Today: Thu Jun 1 23:15:34 2023 User: root | ||
+ | ========================================================================================== | ||
+ | |||
+ | /dev/sg12 - ST12000NM0027 - ZJV1HW2T0000C8496S1T - SCSI | ||
+ | |||
+ | Supported Logical Block Sizes and Protection Types: | ||
+ | --------------------------------------------------- | ||
+ | * - current device format | ||
+ | PI Key: | ||
+ | Y - protection type supported at specified block size | ||
+ | N - protection type not supported at specified block size | ||
+ | ? - unable to determine support for protection type at specified block size | ||
+ | Relative performance key: | ||
+ | N/A - relative performance not available. | ||
+ | Best | ||
+ | Better | ||
+ | Good | ||
+ | Degraded | ||
+ | -------------------------------------------------------------------------------- | ||
+ | Logical Block Size PI-0 PI-1 PI-2 PI-3 Relative Performance Metadata Size | ||
+ | -------------------------------------------------------------------------------- | ||
+ | 512 Y ? ? N N/A N/A | ||
+ | 520 Y ? ? N N/A N/A | ||
+ | 528 Y ? ? N N/A N/A | ||
+ | * 4096 Y ? ? N N/A N/A | ||
+ | 4112 Y ? ? N N/A N/A | ||
+ | 4160 Y ? ? N N/A N/A | ||
+ | -------------------------------------------------------------------------------- | ||
+ | NOTE: Device is not capable of showing all sizes it supports. Only common | ||
+ | sizes are listed. Please consult the product manual for all supported | ||
+ | combinations. | ||
+ | NOTE: This device supports protection information (PI) (a.k.a. End to End protection). | ||
+ | Type 0 - No protection beyond transport protocol | ||
+ | Type 1 - Logical Block Guard and Logical Block Reference Tag | ||
+ | Type 2 - Logical Block Guard and Logical Block Reference Tag (except first block) | ||
+ | 32byte read/write CDBs allowed | ||
+ | Not all forms of PI are supported on all sector sizes unless otherwise indicated | ||
+ | in the device product manual. | ||
+ | NOTE: This device supports Fast Format. Fast format is not instantaneous and is used for | ||
+ | switching between 5xx and 4xxx sector sizes. A fast format may take a few minutes or longer | ||
+ | but may take longer depending on the size of the drive. Fast format support does not necessarily | ||
+ | mean switching sector sizes AND changing PI at the same time is supported. In most cases, a | ||
+ | switch of PI type will require a full device format. | ||
+ | Fast format mode 1 is typically used to switch from 512 to 4096 block sizes with the current | ||
+ | PI scheme. | ||
+ | |||
+ | Well that was a bust. But from reading the last thing, it looks like we'll have to try a format of the disk (again) | ||
+ | |||
+ | # ./SeaChest_Format_x86_64-redhat-linux --protectionType 0 --formatUnit 4096 --confirm this-will-erase-data --poll -d /dev/sg11 | ||
+ | ========================================================================================== | ||
+ | SeaChest_Format - Seagate drive utilities - NVMe Enabled | ||
+ | Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved | ||
+ | SeaChest_Format Version: 2.3.1-2_2_3 X86_64 | ||
+ | Build Date: Jun 17 2021 | ||
+ | Today: Thu Jun 1 23:34:40 2023 User: root | ||
+ | ========================================================================================== | ||
+ | |||
+ | /dev/sg11 - ST12000NM0027 - ZJV2GTFT0000C9069US9 - SCSI | ||
+ | Format Unit | ||
+ | Performing SCSI drive format. | ||
+ | Depending on the format request, this could take minutes to hours or days. | ||
+ | Do not remove power or attempt other access as interrupting it may make | ||
+ | the drive unusable or require performing this command again!! | ||
+ | Progress will be updated every 5 minutes | ||
+ | Percent Complete: 0.00% | ||
+ | |||
+ | After about 40 hours the low level format ended. I was able to access the disk directly without needing to power cycle it it, as I'd needed to do in the past. Unsure if this is the case. | ||
+ | |||
+ | Here's the output from the info command. <s>Note this reset the MRIE to 0!</s> I suspect this was due to this being formatted when I was working with the other disks and I couldn't se the MRIE on it as it was formatting at the time. | ||
+ | |||
+ | # ./SeaChest_Basics_x86_64-redhat-linux -d /dev/sg11 -i | ||
+ | ========================================================================================== | ||
+ | SeaChest_Basics - Seagate drive utilities - NVMe Enabled | ||
+ | Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved | ||
+ | SeaChest_Basics Version: 3.1.0-2_2_3 X86_64 | ||
+ | Build Date: Jun 17 2021 | ||
+ | Today: Sun Jun 4 16:00:45 2023 User: root | ||
+ | ========================================================================================== | ||
+ | |||
+ | /dev/sg11 - ST12000NM0027 - ZJV0FC790000R8168TJ9 - SCSI | ||
+ | Vendor ID: SEAGATE | ||
+ | Model Number: ST12000NM0027 | ||
+ | Serial Number: ZJV0FC79 | ||
+ | PCBA Serial Number: 0000R8168TJ9 | ||
+ | Firmware Revision: E004 | ||
+ | World Wide Name: 5000C500953FDF87 | ||
+ | Copyright: Copyright (c) 2020 Seagate All rights reserved | ||
+ | Drive Capacity (TB/TiB): 12.00/10.91 | ||
+ | Temperature Data: | ||
+ | Current Temperature (C): 40 | ||
+ | Highest Temperature (C): Not Reported | ||
+ | Lowest Temperature (C): Not Reported | ||
+ | Power On Time: 3 years 217 days 23 hours 26 minutes | ||
+ | Power On Hours: 31511.43 | ||
+ | MaxLBA: 2929721343 | ||
+ | Native MaxLBA: Not Reported | ||
+ | Logical Sector Size (B): 4096 | ||
+ | Physical Sector Size (B): 4096 | ||
+ | Sector Alignment: 0 | ||
+ | Rotation Rate (RPM): 7200 | ||
+ | Form Factor: 3.5" | ||
+ | Last DST information: | ||
+ | DST has never been run | ||
+ | Long Drive Self Test Time: 19 hours 8 minutes | ||
+ | Interface speed: | ||
+ | Port 0 (Current Port) | ||
+ | Max Speed (GB/s): 12.0 | ||
+ | Negotiated Speed (Gb/s): 12.0 | ||
+ | Port 1 | ||
+ | Max Speed (GB/s): 12.0 | ||
+ | Negotiated Speed (Gb/s): Not Reported | ||
+ | Annualized Workload Rate (TB/yr): 10.31 | ||
+ | Total Bytes Read (TB): 27.92 | ||
+ | Total Bytes Written (TB): 9.18 | ||
+ | Encryption Support: Not Supported | ||
+ | Cache Size (MiB): Not Reported | ||
+ | Read Look-Ahead: Enabled | ||
+ | Non-Volatile Cache: Enabled | ||
+ | Write Cache: Enabled | ||
+ | SMART Status: Good | ||
+ | ATA Security Information: Not Supported | ||
+ | Firmware Download Support: Full, Segmented, Deferred | ||
+ | Number of Logical Units: 1 | ||
+ | Specifications Supported: | ||
+ | SPC-4 | ||
+ | SAM-5 | ||
+ | SAS-3 | ||
+ | SPL-3 | ||
+ | SPC-4 | ||
+ | SBC-3 | ||
+ | Features Supported: | ||
+ | Protection Type 1 | ||
+ | Protection Type 2 | ||
+ | Application Client Logging | ||
+ | Self Test | ||
+ | Automatic Write Reassignment [Enabled] | ||
+ | Automatic Read Reassignment [Enabled] | ||
+ | EPC [Enabled] | ||
+ | '''Informational Exceptions [Mode 0]''' | ||
+ | Translate Address | ||
+ | Rebuild Assist | ||
+ | Seagate Remanufacture | ||
+ | Seagate In Drive Diagnostics (IDD) | ||
+ | Format Unit | ||
+ | Fast Format | ||
+ | Sanitize | ||
+ | Adapter Information: | ||
+ | Vendor ID: 117Ch | ||
+ | Product ID: 8072h | ||
+ | Revision: 0006h | ||
+ | |||
+ | Looking at the size and blocks now, it matches the others, so this worked! | ||
+ | |||
+ | # sg_readcap -l /dev/sg11 | ||
+ | Read Capacity results: | ||
+ | Protection: prot_en=0, p_type=0, p_i_exponent=0 | ||
+ | Logical block provisioning: lbpme=0, lbprz=0 | ||
+ | Last LBA=2929721343 (0xae9fffff), Number of logical blocks=2929721344 | ||
+ | Logical block length=4096 bytes | ||
+ | Logical blocks per physical block exponent=0 | ||
+ | Lowest aligned LBA=0 | ||
+ | Hence: | ||
+ | Device size: 12000138625024 bytes, 11444224.0 MiB, 12000.14 GB, 12.00 TB | ||
+ | root@Lab-ASL:~# atdevinfo -i all | ||
+ | |||
+ | However this confirmed the MRIE is reset to 0 and the write cache is still enabled. That's good. | ||
+ | # sdparm /dev/sg11 --get=MRIE | ||
+ | /dev/sg11: SEAGATE ST12000NM0027 E004 | ||
+ | MRIE 0 [cha: y, def: 0, sav: 0] | ||
+ | |||
+ | # sdparm --get=WCE /dev/sg11 | ||
+ | /dev/sg11: SEAGATE ST12000NM0027 E004 | ||
+ | WCE 1 [cha: y, def: 1, sav: 1] | ||
+ | |||
+ | This below is the info from the ATTO utilities (HBA) for this disk. | ||
+ | |||
+ | ******************************************************************** | ||
+ | Target 3 Unit 0 (Channel 1, Port 0) | ||
+ | ******************************************************************** | ||
+ | Bus:Target:Unit: 0:3:0 | ||
+ | OS Target ID: 25 | ||
+ | Vendor: SEAGATE | ||
+ | Product: ST12000NM0027 | ||
+ | Firmware Revision: E004 | ||
+ | Port Address: 50:00:C5:00:95:3F:DF:85 | ||
+ | Node Address: N/A | ||
+ | OS Device Name: /dev/sdj | ||
+ | Device Type: Disk | ||
+ | Serial Number: ZJV0FC790000R8168TJ9 | ||
+ | Status: Ready | ||
+ | SES Enclosure: Target 20, LUN 0 | ||
+ | SES Slot: 23 | ||
+ | |||
+ | SSD: No | ||
+ | Capacity: 10.91 TB | ||
+ | Sector Size: 4096 B | ||
+ | T10-PI: Disabled | ||
+ | (Types 1 and 2 supported) | ||
+ | |||
+ | ==================================================================== | ||
+ | SAS Protocol Information | ||
+ | ==================================================================== | ||
+ | Initiator Flags: None | ||
+ | Target Flags: SSP | ||
+ | Negotiated Rate: 12 Gb/s | ||
+ | SAS Depth: 1 | ||
+ | Slot Number: 20 | ||
+ | SAS Port ID: 0 | ||
+ | Topology: Expander | ||
+ | Expander PHY ID: 16 | ||
+ | |||
+ | ==================================================================== | ||
+ | Supported Vital Product Data Pages | ||
+ | ==================================================================== | ||
+ | Device Identification: Supported | ||
+ | Extended Inquiry Data: Supported | ||
+ | Power Condition: Supported | ||
+ | Unit Serial Number: Supported | ||
+ | |||
+ | ATA Information: Unsupported | ||
+ | Block Device Characteristics: Supported | ||
+ | Block Limits: Supported | ||
+ | Logical Block Provisioning: Supported | ||
+ | |||
+ | ==================================================================== | ||
+ | Block Device Characteristics Information | ||
+ | ==================================================================== | ||
+ | Medium Rotation Rate: 7200 | ||
+ | Form Factor: 3.5 in. | ||
+ | Background Operation Control: Unsupported | ||
+ | |||
+ | ==================================================================== | ||
+ | Supported Log Pages | ||
+ | ==================================================================== | ||
+ | Informational Exceptions: Supported | ||
+ | Protocol Specific Port: Supported | ||
+ | Self Test Results: Supported | ||
+ | Temperature: Supported | ||
+ | |||
+ | ==================================================================== | ||
+ | Temperature Information | ||
+ | ==================================================================== | ||
+ | Current Temperature: 40 C | ||
+ | Reference Temperature: 60 C | ||
+ | |||
+ | ==================================================================== | ||
+ | Mode Parameters | ||
+ | ==================================================================== | ||
+ | Write Caching: Enabled (Default) | ||
+ | Read Ahead: Enabled (Default) | ||
+ | |||
+ | IT Nexus Loss Time: 53.255 s (Default) | ||
+ | Initiator Response Timeout: 53.255 s (Default) | ||
+ | Reject To Open Limit: Vendor Specific (Default) | ||
+ | Maximum Allowed XFER RDY: Unlimited (Read-Only) | ||
+ | |||
+ | Transport Layer Retries: Disabled (Read-Only) | ||
+ | |||
+ | This was another disk and I did it under Time. Note this only polls once ever 5 min, so time could be 5 min off, but it's not a huge amount retaliative to the actual formatting time. | ||
+ | |||
+ | $ time sudo /root/SeaChest/Linux/Non-RAID/centos-7_x86_64/SeaChest_Format_x86_64-redhat-linux --protectionType 0 --formatUnit 4096 --confirm this-will-erase-data --poll -d /dev/sg16 | ||
+ | ========================================================================================== | ||
+ | SeaChest_Format - Seagate drive utilities - NVMe Enabled | ||
+ | Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved | ||
+ | SeaChest_Format Version: 2.3.1-2_2_3 X86_64 | ||
+ | Build Date: Jun 17 2021 | ||
+ | Today: Sat Jun 3 13:56:50 2023 User: root | ||
+ | ========================================================================================== | ||
+ | |||
+ | /dev/sg16 - ST12000NM0027 - ZJV2AB070000C906Q6E2 - SCSI | ||
+ | Format Unit | ||
+ | Performing SCSI drive format. | ||
+ | Depending on the format request, this could take minutes to hours or days. | ||
+ | Do not remove power or attempt other access as interrupting it may make | ||
+ | the drive unusable or require performing this command again!! | ||
+ | |||
+ | Progress will be updated every 5 minutes | ||
+ | Percent Complete: 99.99% | ||
+ | Percent Complete: 100.00% | ||
+ | Format Unit was Successful! | ||
+ | |||
+ | |||
+ | real 1945m3.067s | ||
+ | user 0m0.032s | ||
+ | sys 0m0.081s | ||
+ | |||
+ | 32.4166666667 hours | ||
+ | 32 hours 25 min. | ||
+ | |||
+ | I averaged all my times and found it was 34 hours across all the disks. I thought this may have been affected by the hours or errors on the disks that the format found. | ||
+ | |||
+ | This command showed all grown (non errors from the factory) on the disks. I could find no correlation between power on hours and grown errors. Note that low level formatting will check each sector and remap it if it a sector is unreadable | ||
+ | |||
+ | for i in `seq 2 25` ; do SG="/dev/sg$i" ; echo $SG ; ./SeaChest_SMART_x86_64-redhat-linux -q --showSCSIDefects g -d $SG; done | ||
+ | /dev/sg2 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg3 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg4 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg5 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg6 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg7 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | '''402794 11 208''' | ||
+ | |||
+ | /dev/sg8 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | '''224641 8 283''' | ||
+ | |||
+ | /dev/sg9 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg10 | ||
+ | Reading Defects not supported on this device or unsupported defect list format was given. | ||
+ | |||
+ | /dev/sg11 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg12 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg13 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg14 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg15 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg16 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg17 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg18 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg19 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg20 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg21 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | Generation Code: 1 | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg22 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg23 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | Generation Code: 1 | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg24 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | Generation Code: 1 | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | /dev/sg25 | ||
+ | ===SCSI Defect List=== | ||
+ | List includes grown defects | ||
+ | NOTE: At this time, reported defects are for the entire device, not a single logical unit | ||
+ | ---Physical Sector Format--- | ||
+ | Cylinder Head Sector | ||
+ | No Defects Found | ||
+ | |||
+ | As it's only one error on each disk, I have no idea if that's a pattern. I will ensure they are on different pools or used as spares in any event. | ||
+ | |||
+ | == Informational Exceptions == | ||
+ | |||
+ | '''MRIE (Method Of Reporting Informational Exceptions) field''' | ||
+ | |||
+ | This defines how the disk handles errors on the SAS level. | ||
+ | |||
+ | [https://www.seagate.com/files/staticfiles/support/docs/manual/Interface%20manuals/100293068j.pdf Page 417, table 391, of this document] shows the following explanations: | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! MRIE<br /> | ||
+ | ! style="font-weight:bold;" | Description | ||
+ | |- | ||
+ | | 0 | ||
+ | | No reporting of informational exception condition: The device server shall not report information exception conditions. | ||
+ | |- | ||
+ | | 1 | ||
+ | | Asynchronous event reporting: Obsolete | ||
+ | |- | ||
+ | | 2 | ||
+ | | Generate unit attention: <br />The device server shall report informational exception conditions by establishing a unit attention condition (see SAM-5) for the initiator port associated with every I_T nexus, with the additional sense code set to indicate the cause of the informational exception condition.<br /><br />As defined in SAM-5, the command that has the CHECK CONDITION status with the sense key set to UNIT ATTENTION is not processed before the informational exception condition is reported | ||
+ | |- | ||
+ | | 3 | ||
+ | | Conditionally generate recovered error: The device server shall report informational exception conditions, if the reporting of recovered errors is allowed, by returning a CHECK CONDITION status. If the TEST bit is set to zero, the status may be returned after the informational exception condition occurs on any command for which GOOD status or INTERMEDIATE status would have been returned. If the TEST bit is set to one, the status shall be returned on the next command received on any I_T nexus that is normally capable of returning an informational exception condition when the test bit is set to zero. The sense key shall be set to RECOVERED ERROR and the additional sense code shall indicate the cause of the informational exception condition.<br /><br />The command that returns the CHECK CONDITION for the informational exception shall complete without error before any informational exception condition may be reported. | ||
+ | |- | ||
+ | | 4 | ||
+ | | Unconditionally generate recovered error: The device server shall report informational exception conditions, regardless of whether the reporting of recovered errors is allowed, by returning a CHECK CONDITION status. If the TEST bit is set to zero, thestatus may be returned after the informational exception condition occurs on any command for which GOOD status or INTERMEDIATE status would have been returned. If the TEST bit is set to one, the status shall be returned on the next command received on any I_T nexus that is normally capable of returning an informational exception condition when the TEST bit is set to zero. The sense key shall be set to RECOVERED ERROR and the additional sense code shall indicate the cause of the informational exception condition.<br /><br />The command that returns the CHECK CONDITION for the informational exception shall complete without error before any informational exception condition may be reported. | ||
+ | |- | ||
+ | | 5 | ||
+ | | Generate no sense: The device server shall report informational exception conditions by returning a CHECK CONDITION status. If the TEST bit is set to zero, the status may be returned after the informational exception condition occurs on any command for which GOOD status or INTERMEDIATE status would have been returned. If the TEST bit is set to one, the status shall be returned on the next command received on any I_T nexus that is normally capable of returning an informational exception condition when the TEST bit is set to zero. The sense key shall be set to NO SENSE and the additional sense code shall indicate the cause of the informational exception condition.<br /><br />The command that returns the CHECK CONDITION for the informational exception shall complete without error before any informational exception condition may be reported. | ||
+ | |- | ||
+ | | 6 | ||
+ | | Only report informational exception condition on request: The device server shall preserve the informational exception(s) information. To find out about information exception conditions the application client polls the device server by issuing a REQUEST SENSE command. In the REQUEST SENSE parameter data that contains the sense data, the sense key shall be set to NO SENSE and the additional sense code shall indicate the cause of the informational exception condition. | ||
+ | |- | ||
+ | | 7-B | ||
+ | | Reserved | ||
+ | |- | ||
+ | | C-F | ||
+ | | Vendor specific | ||
+ | |- | ||
+ | | | ||
+ | | | ||
+ | |} | ||
+ | |||
+ | I can find no suggested setting of this for Linux or ZFS use. From my reading I will choose to set this to 4. | ||
+ | |||
+ | There a two ways to set this. | ||
+ | |||
+ | === SeaChest_SMART === | ||
+ | |||
+ | This is the command to set it via SeaChest | ||
+ | |||
+ | # ./SeaChest_SMART_x86_64-redhat-linux -d /dev/sg11 --setMRIE 4 | ||
+ | ========================================================================================== | ||
+ | SeaChest_SMART - Seagate drive utilities - NVMe Enabled | ||
+ | Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved | ||
+ | SeaChest_SMART Version: 2.0.1-2_2_3 X86_64 | ||
+ | Build Date: Jun 17 2021 | ||
+ | Today: Thu Jun 1 21:29:34 2023 User: root | ||
+ | ========================================================================================== | ||
+ | |||
+ | /dev/sg11 - ST12000NM0027 - ZJV2GTFT0000C9069US9 - SCSI | ||
+ | Successfully set MRIE mode to 4 | ||
+ | |||
+ | # ./SeaChest_SMART_x86_64-redhat-linux -d /dev/sg11 -i | ||
+ | ========================================================================================== | ||
+ | SeaChest_SMART - Seagate drive utilities - NVMe Enabled | ||
+ | Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved | ||
+ | SeaChest_SMART Version: 2.0.1-2_2_3 X86_64 | ||
+ | Build Date: Jun 17 2021 | ||
+ | Today: Thu Jun 1 21:29:37 2023 User: root | ||
+ | ========================================================================================== | ||
+ | |||
+ | /dev/sg11 - ST12000NM0027 - ZJV2GTFT0000C9069US9 - SCSI | ||
+ | Vendor ID: SEAGATE | ||
+ | Model Number: ST12000NM0027 | ||
+ | Serial Number: ZJV2GTFT | ||
+ | PCBA Serial Number: 0000C9069US9 | ||
+ | Firmware Revision: E004 | ||
+ | World Wide Name: 5000C500A6F0CC8B | ||
+ | Copyright: Copyright (c) 2020 Seagate All rights reserved | ||
+ | Drive Capacity (TB/TiB): 11.92/10.84 | ||
+ | Temperature Data: | ||
+ | Current Temperature (C): 36 | ||
+ | Highest Temperature (C): Not Reported | ||
+ | Lowest Temperature (C): Not Reported | ||
+ | Power On Time: 2 years 100 days 15 hours 26 minutes | ||
+ | Power On Hours: 19935.43 | ||
+ | MaxLBA: 2909274111 | ||
+ | Native MaxLBA: Not Reported | ||
+ | Logical Sector Size (B): 4096 | ||
+ | Physical Sector Size (B): 4096 | ||
+ | Sector Alignment: 0 | ||
+ | Rotation Rate (RPM): 7200 | ||
+ | Form Factor: 3.5" | ||
+ | Last DST information: | ||
+ | DST has never been run | ||
+ | Long Drive Self Test Time: 19 hours 1 minute | ||
+ | Interface speed: | ||
+ | Port 0 (Current Port) | ||
+ | Max Speed (GB/s): 12.0 | ||
+ | Negotiated Speed (Gb/s): 12.0 | ||
+ | Port 1 | ||
+ | Max Speed (GB/s): 12.0 | ||
+ | Negotiated Speed (Gb/s): Not Reported | ||
+ | Annualized Workload Rate (TB/yr): 12.36 | ||
+ | Total Bytes Read (TB): 23.82 | ||
+ | Total Bytes Written (TB): 4.30 | ||
+ | Encryption Support: Not Supported | ||
+ | Cache Size (MiB): Not Reported | ||
+ | Read Look-Ahead: Enabled | ||
+ | Non-Volatile Cache: Enabled | ||
+ | Write Cache: Enabled | ||
+ | SMART Status: Good | ||
+ | ATA Security Information: Not Supported | ||
+ | Firmware Download Support: Full, Segmented, Deferred | ||
+ | Number of Logical Units: 1 | ||
+ | Specifications Supported: | ||
+ | SPC-4 | ||
+ | SAM-5 | ||
+ | SAS-3 | ||
+ | SPL-3 | ||
+ | SPC-4 | ||
+ | SBC-3 | ||
+ | Features Supported: | ||
+ | Protection Type 1 | ||
+ | Protection Type 2 [Enabled] | ||
+ | Application Client Logging | ||
+ | Self Test | ||
+ | Automatic Write Reassignment [Enabled] | ||
+ | Automatic Read Reassignment [Enabled] | ||
+ | EPC [Enabled] | ||
+ | '''Informational Exceptions [Mode 4]''' | ||
+ | Translate Address | ||
+ | Rebuild Assist | ||
+ | Seagate Remanufacture | ||
+ | Seagate In Drive Diagnostics (IDD) | ||
+ | Format Unit | ||
+ | Fast Format | ||
+ | Sanitize | ||
+ | Adapter Information: | ||
+ | Vendor ID: 117Ch | ||
+ | Product ID: 8072h | ||
+ | Revision: 0006h | ||
+ | |||
+ | === sg utils === | ||
+ | |||
+ | # sdparm /dev/sg11 --set=MRIE=4 | ||
+ | /dev/sg11: SEAGATE ST12000NM0027 E004 | ||
+ | |||
+ | # sdparm /dev/sg11 --get=MRIE | ||
+ | /dev/sg11: SEAGATE ST12000NM0027 E004 | ||
+ | MRIE 4 [cha: y, def: 0, sav: 4] |
Latest revision as of 14:11, 8 September 2024
Notes on ZFS
Contents
Home setup
On osx I'm running a bunch of 12tb disks in a raidz2 config. My intent is to migrate to a zpool with special devices in it.
Plan is 20 12tb disks in 2 vdev's of raidz2 with 3.2 TB SSD's in a mirror. I'll use the m2 SSD on the server for ZIL and l2arc.
This should give about 174.56 TiB of space.
Block Size Histogram block psize lsize asize size Count Size Cum. Count Size Cum. Count Size Cum. 512: 350K 175M 175M 350K 175M 175M 0 0 0 1K: 348K 413M 589M 348K 413M 589M 0 0 0 2K: 273K 722M 1.28G 273K 722M 1.28G 0 0 0 4K: 669K 2.65G 3.93G 221K 1.17G 2.45G 0 0 0 8K: 925K 8.50G 12.4G 176K 1.91G 4.36G 1.23M 14.7G 14.7G 16K: 620M 9.69T 9.70T 621M 9.70T 9.70T 621M 14.6T 14.6T 32K: 1.39M 62.8G 9.76T 82.2K 3.57G 9.70T 410K 19.0G 14.6T 64K: 548K 47.3G 9.81T 47.2K 4.06G 9.71T 1.58M 153G 14.7T 128K: 825K 150G 9.95T 1014K 128G 9.83T 699K 133G 14.9T 256K: 66.3M 16.6T 26.5T 68.4M 17.1T 26.9T 66.6M 20.3T 35.1T 512K: 0 0 26.5T 0 0 26.9T 0 0 35.1T 1M: 0 0 26.5T 0 0 26.9T 0 0 35.1T 2M: 0 0 26.5T 0 0 26.9T 0 0 35.1T 4M: 0 0 26.5T 0 0 26.9T 0 0 35.1T 8M: 0 0 26.5T 0 0 26.9T 0 0 35.1T 16M: 0 0 26.5T 0 0 26.9T 0 0 35.1T
Things to do
- Set Write Cache
- set MRIE = 4
- ll format
- record address and hours.
One liner to do this
for i in `seq 2 25` ; do SG="/dev/sg$i" ; echo $SG ;sdparm --set=WCE --save $SG ; sdparm --get=WCE $SG; sdparm --set=MRIE=4 --save $SG; sdparm --get=MRIE $SG; done
Optimization
All disks should be updated
./SeaChest_Firmware_x86_64-redhat-linux --downloadFW /root/MobulaExosX12SAS-STD-5xxE-E004.LOD -d /dev/sg7
All disks should be 4k sectors. The spinning disks should be long formatted to detect bad blocks.
./SeaChest_Lite_x86_64-redhat-linux --setSectorSize 4096 --confirm this-will-erase-data -d /dev/sg8
Write cache should be enabled:
# sdparm --get=WCE /dev/sg5 /dev/sg5: SEAGATE ST12000NM0027 E004 WCE 0 [cha: y, def: 1, sav: 0] # sdparm --set=WCE --save /dev/sg5 | /dev/sg5: SEAGATE ST12000NM0027 E004 # sdparm --get=WCE --save /dev/sg5 /dev/sg5: SEAGATE ST12000NM0027 E004 WCE 1 [cha: y, def: 1, sav: 1]
ashift= 13 = 8192 byte per IO. recordsize 256K compression lz4 casesensitivity insensitive special_small_blocks 128K zdb -Lbbb PoolName zpool create -f -o ashift=13 -O normalization=formD \ -O compression=lz4 -O atime=off -O recordsize=256k -O special_small_blocks=128k BigPool \ raidz2 /dev/sdk /dev/sdu /dev/sdo /dev/sdq /dev/sds /dev/sdb /dev/sdc /dev/sdd \ raidz2 /dev/sdl /dev/sdm /dev/sdn /dev/sde /dev/sdp /dev/sdf /dev/sdg /dev/sdi \ special mirror /dev/sdt /dev/sdw /dev/sdv \ spare /dev/sdh /dev/sdx /dev/sdr /dev/sdj zpool add ZfsMediaPool log /dev/disk5s3 zpool add ZfsMediaPool cache /dev/disk5s4
=
https://github.com/openzfs/zfs/discussions/12769
Disk Notes
I've run into some issues seeing my disk size "MAX LBA" be different even after formatting
0:17:0 SEAGATE ST12000NM0027 E004 Disk 10.91 TB 50:00:C5:00:A6:F0:A0:79 sdn sg15 SN:ZJV2GV4B0000C908373F 0:18:0 SEAGATE ST12000NM0027 E004 Disk 10.84 TB 50:00:C5:00:A6:F0:CC:89 sdj sg11 SN:ZJV2GTFT0000C9069US9
This is strange the size is different. I ran the info command on these
SeaChest_Basics_x86_64-redhat-linux -d /dev/sg11 -i SeaChest_Basics_x86_64-redhat-linux -d /dev/sg15 -i
This shows the following different options
Drive Capacity (TB/TiB): 12.00/10.91 Drive Capacity (TB/TiB): 11.92/10.84 Protection Type 2 Protection Type 2 [Enabled] Informational Exceptions [Mode 4] Informational Exceptions [Mode 0]
sg_readcap -l /dev/sg15 Read Capacity results: Protection: prot_en=0, p_type=0, p_i_exponent=0 Logical block provisioning: lbpme=0, lbprz=0 Last LBA=2929721343 (0xae9fffff), Number of logical blocks=2929721344 Logical block length=4096 bytes Logical blocks per physical block exponent=0 Lowest aligned LBA=0 Hence: Device size: 12000138625024 bytes, 11444224.0 MiB, 12000.14 GB, 12.00 TB
# sg_readcap -l /dev/sg11 Read Capacity results: Protection: prot_en=1, p_type=1, p_i_exponent=0 [type 2 protection] Logical block provisioning: lbpme=0, lbprz=0 Last LBA=2909274111 (0xad67ffff), Number of logical blocks=2909274112 Logical block length=4096 bytes Logical blocks per physical block exponent=0 Lowest aligned LBA=0 Hence: Device size: 11916386762752 bytes, 11364352.0 MiB, 11916.39 GB, 11.92 TB
Thus the smaller drive has something called Protection Type 2 enabled. I had no idea what this is. Some searching turned up this website
Not knowing what this was, I then went down a seemingly never ending spiral of T10 Protection Information [PDF] standards. Its pretty neat, how I understand it is the disk controller formats the platters to 520 byte sectors, instead of the more traditional 512 byte sectors, these 8 extra bytes per sector are there for the controller to make sure that the data written to that sector is the same data that is read from it, sort of like data verification. The disk controller can then presents the system (HBA controller or raid card) with the normal 512 bytes of data per section, and any SCSI compatible controller should be able to read and write to it just fine.
Essentially this is a hold over from the 520 byte sectors and 8 bytes being used for a checksum/parity. This isn't needed in ZFS. It wastes 79.872 GiBytes of space from the disk too!
Note: I'm not sure this is due to the waste of space on this disk. It could be something the OEM does different for them as they were original 520b disks.
I tried
# SeaChest_Basics_x86_64-redhat-linux -d /dev/sg11 --restoreMaxLBA
And it didn't increase the LBA.
So lets look at the supported formats for this:
# ./SeaChest_Format_x86_64-redhat-linux -d /dev/sg11 --showSupportedFormats ========================================================================================== SeaChest_Format - Seagate drive utilities - NVMe Enabled Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved SeaChest_Format Version: 2.3.1-2_2_3 X86_64 Build Date: Jun 17 2021 Today: Thu Jun 1 23:15:34 2023 User: root ========================================================================================== /dev/sg12 - ST12000NM0027 - ZJV1HW2T0000C8496S1T - SCSI Supported Logical Block Sizes and Protection Types: --------------------------------------------------- * - current device format PI Key: Y - protection type supported at specified block size N - protection type not supported at specified block size ? - unable to determine support for protection type at specified block size Relative performance key: N/A - relative performance not available. Best Better Good Degraded -------------------------------------------------------------------------------- Logical Block Size PI-0 PI-1 PI-2 PI-3 Relative Performance Metadata Size -------------------------------------------------------------------------------- 512 Y ? ? N N/A N/A 520 Y ? ? N N/A N/A 528 Y ? ? N N/A N/A * 4096 Y ? ? N N/A N/A 4112 Y ? ? N N/A N/A 4160 Y ? ? N N/A N/A -------------------------------------------------------------------------------- NOTE: Device is not capable of showing all sizes it supports. Only common sizes are listed. Please consult the product manual for all supported combinations. NOTE: This device supports protection information (PI) (a.k.a. End to End protection). Type 0 - No protection beyond transport protocol Type 1 - Logical Block Guard and Logical Block Reference Tag Type 2 - Logical Block Guard and Logical Block Reference Tag (except first block) 32byte read/write CDBs allowed Not all forms of PI are supported on all sector sizes unless otherwise indicated in the device product manual. NOTE: This device supports Fast Format. Fast format is not instantaneous and is used for switching between 5xx and 4xxx sector sizes. A fast format may take a few minutes or longer but may take longer depending on the size of the drive. Fast format support does not necessarily mean switching sector sizes AND changing PI at the same time is supported. In most cases, a switch of PI type will require a full device format. Fast format mode 1 is typically used to switch from 512 to 4096 block sizes with the current PI scheme.
Well that was a bust. But from reading the last thing, it looks like we'll have to try a format of the disk (again)
# ./SeaChest_Format_x86_64-redhat-linux --protectionType 0 --formatUnit 4096 --confirm this-will-erase-data --poll -d /dev/sg11 ========================================================================================== SeaChest_Format - Seagate drive utilities - NVMe Enabled Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved SeaChest_Format Version: 2.3.1-2_2_3 X86_64 Build Date: Jun 17 2021 Today: Thu Jun 1 23:34:40 2023 User: root ========================================================================================== /dev/sg11 - ST12000NM0027 - ZJV2GTFT0000C9069US9 - SCSI Format Unit Performing SCSI drive format. Depending on the format request, this could take minutes to hours or days. Do not remove power or attempt other access as interrupting it may make the drive unusable or require performing this command again!! Progress will be updated every 5 minutes Percent Complete: 0.00%
After about 40 hours the low level format ended. I was able to access the disk directly without needing to power cycle it it, as I'd needed to do in the past. Unsure if this is the case.
Here's the output from the info command. Note this reset the MRIE to 0! I suspect this was due to this being formatted when I was working with the other disks and I couldn't se the MRIE on it as it was formatting at the time.
# ./SeaChest_Basics_x86_64-redhat-linux -d /dev/sg11 -i ========================================================================================== SeaChest_Basics - Seagate drive utilities - NVMe Enabled Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved SeaChest_Basics Version: 3.1.0-2_2_3 X86_64 Build Date: Jun 17 2021 Today: Sun Jun 4 16:00:45 2023 User: root ========================================================================================== /dev/sg11 - ST12000NM0027 - ZJV0FC790000R8168TJ9 - SCSI Vendor ID: SEAGATE Model Number: ST12000NM0027 Serial Number: ZJV0FC79 PCBA Serial Number: 0000R8168TJ9 Firmware Revision: E004 World Wide Name: 5000C500953FDF87 Copyright: Copyright (c) 2020 Seagate All rights reserved Drive Capacity (TB/TiB): 12.00/10.91 Temperature Data: Current Temperature (C): 40 Highest Temperature (C): Not Reported Lowest Temperature (C): Not Reported Power On Time: 3 years 217 days 23 hours 26 minutes Power On Hours: 31511.43 MaxLBA: 2929721343 Native MaxLBA: Not Reported Logical Sector Size (B): 4096 Physical Sector Size (B): 4096 Sector Alignment: 0 Rotation Rate (RPM): 7200 Form Factor: 3.5" Last DST information: DST has never been run Long Drive Self Test Time: 19 hours 8 minutes Interface speed: Port 0 (Current Port) Max Speed (GB/s): 12.0 Negotiated Speed (Gb/s): 12.0 Port 1 Max Speed (GB/s): 12.0 Negotiated Speed (Gb/s): Not Reported Annualized Workload Rate (TB/yr): 10.31 Total Bytes Read (TB): 27.92 Total Bytes Written (TB): 9.18 Encryption Support: Not Supported Cache Size (MiB): Not Reported Read Look-Ahead: Enabled Non-Volatile Cache: Enabled Write Cache: Enabled SMART Status: Good ATA Security Information: Not Supported Firmware Download Support: Full, Segmented, Deferred Number of Logical Units: 1 Specifications Supported: SPC-4 SAM-5 SAS-3 SPL-3 SPC-4 SBC-3 Features Supported: Protection Type 1 Protection Type 2 Application Client Logging Self Test Automatic Write Reassignment [Enabled] Automatic Read Reassignment [Enabled] EPC [Enabled] Informational Exceptions [Mode 0] Translate Address Rebuild Assist Seagate Remanufacture Seagate In Drive Diagnostics (IDD) Format Unit Fast Format Sanitize Adapter Information: Vendor ID: 117Ch Product ID: 8072h Revision: 0006h
Looking at the size and blocks now, it matches the others, so this worked!
# sg_readcap -l /dev/sg11 Read Capacity results: Protection: prot_en=0, p_type=0, p_i_exponent=0 Logical block provisioning: lbpme=0, lbprz=0 Last LBA=2929721343 (0xae9fffff), Number of logical blocks=2929721344 Logical block length=4096 bytes Logical blocks per physical block exponent=0 Lowest aligned LBA=0 Hence: Device size: 12000138625024 bytes, 11444224.0 MiB, 12000.14 GB, 12.00 TB root@Lab-ASL:~# atdevinfo -i all
However this confirmed the MRIE is reset to 0 and the write cache is still enabled. That's good.
# sdparm /dev/sg11 --get=MRIE /dev/sg11: SEAGATE ST12000NM0027 E004 MRIE 0 [cha: y, def: 0, sav: 0] # sdparm --get=WCE /dev/sg11 /dev/sg11: SEAGATE ST12000NM0027 E004 WCE 1 [cha: y, def: 1, sav: 1] This below is the info from the ATTO utilities (HBA) for this disk. ******************************************************************** Target 3 Unit 0 (Channel 1, Port 0) ******************************************************************** Bus:Target:Unit: 0:3:0 OS Target ID: 25 Vendor: SEAGATE Product: ST12000NM0027 Firmware Revision: E004 Port Address: 50:00:C5:00:95:3F:DF:85 Node Address: N/A OS Device Name: /dev/sdj Device Type: Disk Serial Number: ZJV0FC790000R8168TJ9 Status: Ready SES Enclosure: Target 20, LUN 0 SES Slot: 23 SSD: No Capacity: 10.91 TB Sector Size: 4096 B T10-PI: Disabled (Types 1 and 2 supported) ==================================================================== SAS Protocol Information ==================================================================== Initiator Flags: None Target Flags: SSP Negotiated Rate: 12 Gb/s SAS Depth: 1 Slot Number: 20 SAS Port ID: 0 Topology: Expander Expander PHY ID: 16 ==================================================================== Supported Vital Product Data Pages ==================================================================== Device Identification: Supported Extended Inquiry Data: Supported Power Condition: Supported Unit Serial Number: Supported ATA Information: Unsupported Block Device Characteristics: Supported Block Limits: Supported Logical Block Provisioning: Supported ==================================================================== Block Device Characteristics Information ==================================================================== Medium Rotation Rate: 7200 Form Factor: 3.5 in. Background Operation Control: Unsupported ==================================================================== Supported Log Pages ==================================================================== Informational Exceptions: Supported Protocol Specific Port: Supported Self Test Results: Supported Temperature: Supported ==================================================================== Temperature Information ==================================================================== Current Temperature: 40 C Reference Temperature: 60 C ==================================================================== Mode Parameters ==================================================================== Write Caching: Enabled (Default) Read Ahead: Enabled (Default) IT Nexus Loss Time: 53.255 s (Default) Initiator Response Timeout: 53.255 s (Default) Reject To Open Limit: Vendor Specific (Default) Maximum Allowed XFER RDY: Unlimited (Read-Only) Transport Layer Retries: Disabled (Read-Only)
This was another disk and I did it under Time. Note this only polls once ever 5 min, so time could be 5 min off, but it's not a huge amount retaliative to the actual formatting time.
$ time sudo /root/SeaChest/Linux/Non-RAID/centos-7_x86_64/SeaChest_Format_x86_64-redhat-linux --protectionType 0 --formatUnit 4096 --confirm this-will-erase-data --poll -d /dev/sg16 ========================================================================================== SeaChest_Format - Seagate drive utilities - NVMe Enabled Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved SeaChest_Format Version: 2.3.1-2_2_3 X86_64 Build Date: Jun 17 2021 Today: Sat Jun 3 13:56:50 2023 User: root ========================================================================================== /dev/sg16 - ST12000NM0027 - ZJV2AB070000C906Q6E2 - SCSI Format Unit Performing SCSI drive format. Depending on the format request, this could take minutes to hours or days. Do not remove power or attempt other access as interrupting it may make the drive unusable or require performing this command again!! Progress will be updated every 5 minutes Percent Complete: 99.99% Percent Complete: 100.00% Format Unit was Successful! real 1945m3.067s user 0m0.032s sys 0m0.081s 32.4166666667 hours 32 hours 25 min.
I averaged all my times and found it was 34 hours across all the disks. I thought this may have been affected by the hours or errors on the disks that the format found.
This command showed all grown (non errors from the factory) on the disks. I could find no correlation between power on hours and grown errors. Note that low level formatting will check each sector and remap it if it a sector is unreadable
for i in `seq 2 25` ; do SG="/dev/sg$i" ; echo $SG ; ./SeaChest_SMART_x86_64-redhat-linux -q --showSCSIDefects g -d $SG; done
/dev/sg2 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg3 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg4 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg5 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg6 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg7 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector 402794 11 208 /dev/sg8 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector 224641 8 283 /dev/sg9 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg10 Reading Defects not supported on this device or unsupported defect list format was given. /dev/sg11 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg12 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg13 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg14 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg15 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg16 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg17 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg18 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg19 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg20 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg21 ===SCSI Defect List=== List includes grown defects Generation Code: 1 NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg22 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg23 ===SCSI Defect List=== List includes grown defects Generation Code: 1 NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg24 ===SCSI Defect List=== List includes grown defects Generation Code: 1 NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found /dev/sg25 ===SCSI Defect List=== List includes grown defects NOTE: At this time, reported defects are for the entire device, not a single logical unit ---Physical Sector Format--- Cylinder Head Sector No Defects Found
As it's only one error on each disk, I have no idea if that's a pattern. I will ensure they are on different pools or used as spares in any event.
Informational Exceptions
MRIE (Method Of Reporting Informational Exceptions) field
This defines how the disk handles errors on the SAS level.
Page 417, table 391, of this document shows the following explanations:
MRIE |
Description |
---|---|
0 | No reporting of informational exception condition: The device server shall not report information exception conditions. |
1 | Asynchronous event reporting: Obsolete |
2 | Generate unit attention: The device server shall report informational exception conditions by establishing a unit attention condition (see SAM-5) for the initiator port associated with every I_T nexus, with the additional sense code set to indicate the cause of the informational exception condition. As defined in SAM-5, the command that has the CHECK CONDITION status with the sense key set to UNIT ATTENTION is not processed before the informational exception condition is reported |
3 | Conditionally generate recovered error: The device server shall report informational exception conditions, if the reporting of recovered errors is allowed, by returning a CHECK CONDITION status. If the TEST bit is set to zero, the status may be returned after the informational exception condition occurs on any command for which GOOD status or INTERMEDIATE status would have been returned. If the TEST bit is set to one, the status shall be returned on the next command received on any I_T nexus that is normally capable of returning an informational exception condition when the test bit is set to zero. The sense key shall be set to RECOVERED ERROR and the additional sense code shall indicate the cause of the informational exception condition. The command that returns the CHECK CONDITION for the informational exception shall complete without error before any informational exception condition may be reported. |
4 | Unconditionally generate recovered error: The device server shall report informational exception conditions, regardless of whether the reporting of recovered errors is allowed, by returning a CHECK CONDITION status. If the TEST bit is set to zero, thestatus may be returned after the informational exception condition occurs on any command for which GOOD status or INTERMEDIATE status would have been returned. If the TEST bit is set to one, the status shall be returned on the next command received on any I_T nexus that is normally capable of returning an informational exception condition when the TEST bit is set to zero. The sense key shall be set to RECOVERED ERROR and the additional sense code shall indicate the cause of the informational exception condition. The command that returns the CHECK CONDITION for the informational exception shall complete without error before any informational exception condition may be reported. |
5 | Generate no sense: The device server shall report informational exception conditions by returning a CHECK CONDITION status. If the TEST bit is set to zero, the status may be returned after the informational exception condition occurs on any command for which GOOD status or INTERMEDIATE status would have been returned. If the TEST bit is set to one, the status shall be returned on the next command received on any I_T nexus that is normally capable of returning an informational exception condition when the TEST bit is set to zero. The sense key shall be set to NO SENSE and the additional sense code shall indicate the cause of the informational exception condition. The command that returns the CHECK CONDITION for the informational exception shall complete without error before any informational exception condition may be reported. |
6 | Only report informational exception condition on request: The device server shall preserve the informational exception(s) information. To find out about information exception conditions the application client polls the device server by issuing a REQUEST SENSE command. In the REQUEST SENSE parameter data that contains the sense data, the sense key shall be set to NO SENSE and the additional sense code shall indicate the cause of the informational exception condition. |
7-B | Reserved |
C-F | Vendor specific |
I can find no suggested setting of this for Linux or ZFS use. From my reading I will choose to set this to 4.
There a two ways to set this.
SeaChest_SMART
This is the command to set it via SeaChest
# ./SeaChest_SMART_x86_64-redhat-linux -d /dev/sg11 --setMRIE 4 ========================================================================================== SeaChest_SMART - Seagate drive utilities - NVMe Enabled Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved SeaChest_SMART Version: 2.0.1-2_2_3 X86_64 Build Date: Jun 17 2021 Today: Thu Jun 1 21:29:34 2023 User: root ========================================================================================== /dev/sg11 - ST12000NM0027 - ZJV2GTFT0000C9069US9 - SCSI Successfully set MRIE mode to 4 # ./SeaChest_SMART_x86_64-redhat-linux -d /dev/sg11 -i ========================================================================================== SeaChest_SMART - Seagate drive utilities - NVMe Enabled Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved SeaChest_SMART Version: 2.0.1-2_2_3 X86_64 Build Date: Jun 17 2021 Today: Thu Jun 1 21:29:37 2023 User: root ========================================================================================== /dev/sg11 - ST12000NM0027 - ZJV2GTFT0000C9069US9 - SCSI Vendor ID: SEAGATE Model Number: ST12000NM0027 Serial Number: ZJV2GTFT PCBA Serial Number: 0000C9069US9 Firmware Revision: E004 World Wide Name: 5000C500A6F0CC8B Copyright: Copyright (c) 2020 Seagate All rights reserved Drive Capacity (TB/TiB): 11.92/10.84 Temperature Data: Current Temperature (C): 36 Highest Temperature (C): Not Reported Lowest Temperature (C): Not Reported Power On Time: 2 years 100 days 15 hours 26 minutes Power On Hours: 19935.43 MaxLBA: 2909274111 Native MaxLBA: Not Reported Logical Sector Size (B): 4096 Physical Sector Size (B): 4096 Sector Alignment: 0 Rotation Rate (RPM): 7200 Form Factor: 3.5" Last DST information: DST has never been run Long Drive Self Test Time: 19 hours 1 minute Interface speed: Port 0 (Current Port) Max Speed (GB/s): 12.0 Negotiated Speed (Gb/s): 12.0 Port 1 Max Speed (GB/s): 12.0 Negotiated Speed (Gb/s): Not Reported Annualized Workload Rate (TB/yr): 12.36 Total Bytes Read (TB): 23.82 Total Bytes Written (TB): 4.30 Encryption Support: Not Supported Cache Size (MiB): Not Reported Read Look-Ahead: Enabled Non-Volatile Cache: Enabled Write Cache: Enabled SMART Status: Good ATA Security Information: Not Supported Firmware Download Support: Full, Segmented, Deferred Number of Logical Units: 1 Specifications Supported: SPC-4 SAM-5 SAS-3 SPL-3 SPC-4 SBC-3 Features Supported: Protection Type 1 Protection Type 2 [Enabled] Application Client Logging Self Test Automatic Write Reassignment [Enabled] Automatic Read Reassignment [Enabled] EPC [Enabled] Informational Exceptions [Mode 4] Translate Address Rebuild Assist Seagate Remanufacture Seagate In Drive Diagnostics (IDD) Format Unit Fast Format Sanitize Adapter Information: Vendor ID: 117Ch Product ID: 8072h Revision: 0006h
sg utils
# sdparm /dev/sg11 --set=MRIE=4 /dev/sg11: SEAGATE ST12000NM0027 E004 # sdparm /dev/sg11 --get=MRIE /dev/sg11: SEAGATE ST12000NM0027 E004 MRIE 4 [cha: y, def: 0, sav: 4]