メモ:FreeBSDでESPが壊れる問題の調査

NVMe SSD上に確保したESPが、当初は問題ないのに何らかのタイミングで不具合が発生し、ESPとして認識されなくなる問題の調査記録。

diskinfo

$ diskinfo -v /dev/nvd0
/dev/nvd0
        4096            # sectorsize
        960197124096    # mediasize in bytes (894G)
        234423126       # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        SAMSUNG MZQLB960HAJR-00007      # Disk descr.
        S437NY0KCxxxxx  # Disk ident.
        Yes             # TRIM/UNMAP support
        0               # Rotation rate in RPM

$ sudo diskinfo -v /dev/nvme0ns1
/dev/nvme0ns1
        4096            # sectorsize
        960197124096    # mediasize in bytes (894G)
        234423126       # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        No              # TRIM/UNMAP support
        Unknown         # Rotation rate in RPM

$ sudo diskinfo -v /dev/nvd0p1
/dev/nvd0p1
        4096            # sectorsize
        536870912       # mediasize in bytes (512M)
        131072          # mediasize in sectors
        0               # stripesize
        24576           # stripeoffset
        8               # Cylinders according to firmware.
        255             # Heads according to firmware.
        63              # Sectors according to firmware.
        SAMSUNG MZQLB960HAJR-00007      # Disk descr.
        S437NY0KCxxxxx  # Disk ident.
        Yes             # TRIM/UNMAP support
        0               # Rotation rate in RPM

S.M.A.R.T.

$ sudo smartctl -a /dev/nvme0ns1
smartctl 7.0 2018-12-30 r4883 [FreeBSD 12.1-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SAMSUNG MZQLB960HAJR-00007
Serial Number:                      S437NY0KCxxxxx
Firmware Version:                   EDA5202Q
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 960,197,124,096 [960 GB]
Unallocated NVM Capacity:           0
Controller ID:                      4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          960,197,124,096 [960 GB]
Namespace 1 Utilization:            201,605,705,728 [201 GB]
Namespace 1 Formatted LBA Size:     4096
Local Time is:                      Sun Oct 18 11:31:24 2020 JST
Firmware Updates (0x17):            3 Slots, Slot 1 R/O, no Reset required
Optional Admin Commands (0x000f):   Security Format Frmw_DL NS_Mngmt
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     87 Celsius
Critical Comp. Temp. Threshold:     88 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    10.60W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 -     512       0         0
 1 +    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        43 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    2,313,153 [1.18 TB]
Data Units Written:                 22,720,897 [11.6 TB]
Host Read Commands:                 57,512,115
Host Write Commands:                627,513,504
Controller Busy Time:               306
Power Cycles:                       418
Power On Hours:                     10,471
Unsafe Shutdowns:                   395
Media and Data Integrity Errors:    0
Error Information Log Entries:      6
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               43 Celsius
Temperature Sensor 2:               47 Celsius
Temperature Sensor 3:               54 Celsius

gpart

> gpart show nvd0
=>        6  234423115  nvd0  GPT  (894G)
          6     131072     1  efi  (512M)
     131078   26214400     2  freebsd-zfs  (100G)
   26345478    4194304     3  freebsd-zfs  (16G)
   30539782    4194304     4  freebsd-zfs  (16G)
   34734086    2097152     5  freebsd-zfs  (8.0G)
   36831238  162529280     6  freebsd-zfs  (620G)
  199360518   23592960     7  freebsd-zfs  (90G)
  222953478    7864320     8  freebsd-zfs  (30G)
  230817798    3605323        - free -  (14G)

ESPをゼロフィル

$ sudo dd if=/dev/zero of=/dev/nvd0p1 bs=4096
dd: /dev/nvd0p1: end of device
131073+0 records in
131072+0 records out
536870912 bytes transferred in 4.088874 secs (131300437 bytes/sec)

$ sudo dd if=/dev/nvd0p1 of=/usr/home/Decomo/nvd0p1_esp_zero_filled.raw bs=4096
131072+0 records in
131072+0 records out
536870912 bytes transferred in 6.493961 secs (82672331 bytes/sec)

$ cat ~/nvd0p1_esp_zero_filled.raw | tr -d '\0' | read -n 1 || echo "All zeroes."
All zeroes.

ESPを作る

$ sudo newfs_msdos -F 32 -S 4096 -c 1 nvd0p1
/dev/nvd0p1: 130812 sectors in 130812 FAT32 clusters (4096 bytes/cluster)
BytesPerSec=4096 SecPerClust=1 ResSectors=4 FATs=2 Media=0xf0 SecPerTrack=63 Heads=255 HiddenSecs=0 HugeSectors=131072 FATsecs=128 RootCluster=2 FSInfo=1 Backup=2

$ mkdir /tmp/esp
$ sudo mount -t msdosfs /dev/nvd0p1 /tmp/esp
$ mkdir -p /tmp/esp/efi/boot
$ cp /boot/boot1.efi /tmp/esp/efi/boot/BOOTx64.efi
$ echo 'BOOTx64.efi' >> /tmp/esp/efi/boot/startup.nsh
$ sync
$ sync
$ sync
$ sudo umount /tmp/esp

$ dd if=/dev/nvd0p1 of=/usr/home/Decomo/nvd0p1_esp_created.raw bs=4096
131072+0 records in
131072+0 records out
536870912 bytes transferred in 6.256422 secs (85811175 bytes/sec)
$ cat ~/nvd0p1_esp_created.raw | tr -d '\0' | read -n 1 || echo "All zeroes."
tr: Illegal byte sequence
All zeroes.

イメージのsha1

$ sha1 ~/nvd0p1_esp_zero_filled.raw ~/nvd0p1_esp_created.raw
SHA1 (/home/Decomo/nvd0p1_esp_zero_filled.raw) = 5b088492c9f4778f409b7ae61477dec124c99033
SHA1 (/home/Decomo/nvd0p1_esp_created.raw) = c17a73829890d552305af69bdc1ab2321e1f081d

再起動1回目後のイメージ

$ sudo dd if=/dev/nvd0p1 of=/usr/home/Decomo/nvd0p1_esp_reboot_20201018.raw bs=4096
131072+0 records in
131072+0 records out
536870912 bytes transferred in 6.461511 secs (83087514 bytes/sec)

$ sha1 ~/nvd0p1_esp_reboot_20201018.raw
SHA1 (/home/Decomo/nvd0p1_esp_reboot_20201018.raw) = c17a73829890d552305af69bdc1ab2321e1f081d