RAID6-2016: Unterschied zwischen den Versionen

Aus OrgaMon Wiki
Zur Navigation springen Zur Suche springen
Zeile 39: Zeile 39:


[[Datei:20160202 141516.jpg|350px]]
[[Datei:20160202 141516.jpg|350px]]
== Produktiv ==
=== Störung vom 28.12.2017 ===
[243680.637402] aacraid: Host adapter abort request (0,2,3,0)
[243691.068772] sd 0:2:3:0: [sdi] tag#1 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
[243691.068778] sd 0:2:3:0: [sdi] tag#1 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[243691.068786] blk_update_request: I/O error, dev sdi, sector 2064
[243691.068788] md: super_written gets error=-5
[243691.068793] md/raid:md127: Disk failure on sdi1, disabling device.
                md/raid:md127: Operation continuing on 7 devices.
[243801.115324] aacraid: Host adapter abort request timed out
[243801.115334] aacraid: Host adapter abort request (0,2,3,0)
[243801.115384] aacraid: Host adapter reset request. SCSI hang ?
[243921.593220] aacraid: Host adapter reset request timed out
[243921.593230] sd 0:2:3:0: Device offlined - not ready after error recovery
[243921.593233] sd 0:2:3:0: Device offlined - not ready after error recovery
[243921.593248] sd 0:2:3:0: [sdi] tag#8 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
[243921.593252] sd 0:2:3:0: [sdi] tag#8 CDB: Read(10) 28 00 04 a0 c4 00 00 02 00 00
[243921.593256] blk_update_request: I/O error, dev sdi, sector 77644800
[243921.593289] sd 0:2:3:0: [sdi] tag#11 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
[243921.593292] sd 0:2:3:0: [sdi] tag#11 CDB: Read(10) 28 00 04 a0 c6 00 00 02 00 00
[243921.593294] blk_update_request: I/O error, dev sdi, sector 77645312
[416403.254386] hrtimer: interrupt took 29227 ns
[853039.443372] sd 0:2:3:0: rejecting I/O to offline device
[853039.443402] sd 0:2:3:0: rejecting I/O to offline device
[853039.443411] sd 0:2:3:0: rejecting I/O to offline device
[853039.443418] sd 0:2:3:0: rejecting I/O to offline device
[853039.443426] sd 0:2:3:0: rejecting I/O to offline device
[853039.443433] sd 0:2:3:0: rejecting I/O to offline device
[853039.443440] sd 0:2:3:0: rejecting I/O to offline device
[853039.443448] sd 0:2:3:0: rejecting I/O to offline device
[853039.443455] sd 0:2:3:0: rejecting I/O to offline device
[853039.443633] sd 0:2:3:0: rejecting I/O to offline device
[853039.443646] sd 0:2:3:0: rejecting I/O to offline device
[853039.443653] sd 0:2:3:0: rejecting I/O to offline device
[853039.443660] sd 0:2:3:0: rejecting I/O to offline device
[853039.443667] sd 0:2:3:0: rejecting I/O to offline device
[853039.443674] sd 0:2:3:0: rejecting I/O to offline device
[853039.443681] sd 0:2:3:0: rejecting I/O to offline device
[853039.443687] sd 0:2:3:0: rejecting I/O to offline device
* Ich wollte die serial ID der Platte rausfinden, ähm, jedoch hwinfo --disk lieferte bei der Platte nur noch:
28: IDE 23.0: 10600 Disk
  [Created at block.245]
  Unique ID: ipPt.uEhVIzZ7wdA
  Parent ID: B35A.VPIkJrtnW73
  SysFS ID: /class/block/sdi
  SysFS BusID: 0:2:3:0
  SysFS Device Link: /devices/pci0000:00/0000:00:01.1/0000:02:00.0/host0/target0:2:3/0:2:3:0
  Hardware Class: disk
  Model: "WDC WD20PURX-64P"
  Vendor: "WDC"
  Device: "WD20PURX-64P"
  Revision: "0A80"
  Driver: "aacraid", "sd"
  Driver Modules: "aacraid", "sd_mod"
  Device File: /dev/sdi
  Device Files: /dev/sdi, /dev/disk/by-id/scsi-330000d170092e908, /dev/disk/by-id/scsi-SATA_WDC_WD20PURX-64P_WD-WCC4M0XEZ7CH, /dev/disk/by-id/wwn-0x30000d170092e908, /dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0
  Device Number: block 8:128-8:143
  Drive status: no medium
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #15 (Serial Attached SCSI controller)
* es hätte aber angezeigt werden sollen: 
 
28: IDE 23.0: 10600 Disk
  [Created at block.245]
  Unique ID: ipPt.dZvPpEVVaL9
  Parent ID: B35A.VPIkJrtnW73
  SysFS ID: /class/block/sdi
  SysFS BusID: 0:2:3:0
  SysFS Device Link: /devices/pci0000:00/0000:00:01.1/0000:02:00.0/host0/target0:2:3/0:2:3:0
  Hardware Class: disk
  Model: "WDC WD20PURX-64P"
  Vendor: "WDC"
  Device: "WD20PURX-64P"
  Revision: "0A80"
  Serial ID: "WD-WCC4M0XEZ7CH"
  Driver: "aacraid", "sd"
  Driver Modules: "aacraid", "sd_mod"
  Device File: /dev/sdi
  Device Files: /dev/sdi, /dev/disk/by-id/scsi-330000d170092e908, /dev/disk/by-id/scsi-SATA_WDC_WD20PURX-64P_WD-WCC4M0XEZ7CH, /dev/disk/by-id/wwn-0x30000d170092e908, /dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0
  Device Number: block 8:128-8:143
  Geometry (Logical): CHS 243201/255/63
  Size: 3907029168 sectors a 512 bytes
  Capacity: 1863 GB (2000398934016 bytes)
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #15 (Serial Attached SCSI controller)
* ich suche also die Platte "WD-WCC4M0XEZ7CH" 
raus und ersetzt durch:
28: IDE 23.0: 10600 Disk
  [Created at block.245]
  Unique ID: ipPt.IyRYgsTsxUD
  Parent ID: B35A.VPIkJrtnW73
  SysFS ID: /class/block/sdi
  SysFS BusID: 0:2:3:0
  SysFS Device Link: /devices/pci0000:00/0000:00:01.1/0000:02:00.0/host0/target0:2:3/0:2:3:0
  Hardware Class: disk
  Model: "TOSHIBA DT01ACA2"
  Vendor: "TOSHIBA"
  Device: "DT01ACA2"
  Revision: "ABB0"
  Serial ID: "X5RAD3XGS"
  Driver: "aacraid", "sd"
  Driver Modules: "aacraid", "sd_mod"
  Device File: /dev/sdi
  Device Files: /dev/sdi, /dev/disk/by-id/scsi-330000d170092e908, /dev/disk/by-id/scsi-SATA_TOSHIBA_DT01ACA2_X5RAD3XGS, /dev/disk/by-id/wwn-0x30000d170092e908, /dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0
  Device Number: block 8:128-8:143
  Geometry (Logical): CHS 243201/255/63
  Size: 3907029168 sectors a 512 bytes
  Capacity: 1863 GB (2000398934016 bytes)
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #15 (Serial Attached SCSI controller)
* ich schaue mal nach wie der Status des Array ist:
raib2:~ # mdadm --detail /dev/md127
/dev/md127:
          Version : 1.2
    Creation Time : Fri Oct 28 11:41:55 2016
        Raid Level : raid6
        Array Size : 11720294400 (11177.34 GiB 12001.58 GB)
    Used Dev Size : 1953382400 (1862.89 GiB 2000.26 GB)
      Raid Devices : 8
    Total Devices : 7
      Persistence : Superblock is persistent
    Intent Bitmap : Internal
      Update Time : Thu Dec 28 14:39:27 2017
            State : clean, degraded
    Active Devices : 7
  Working Devices : 7
    Failed Devices : 0
    Spare Devices : 0
            Layout : left-symmetric
        Chunk Size : 512K
Consistency Policy : bitmap
              Name : raib2:0  (local to host raib2)
              UUID : 500aa0db:5aca5187:5617c3ff:dc97c2c4
            Events : 10316
    Number  Major  Minor  RaidDevice State
      0      8      17        0      active sync  /dev/sdb1
      1      8      33        1      active sync  /dev/sdc1
      2      8      49        2      active sync  /dev/sdd1
      3      8      65        3      active sync  /dev/sde1
      -      0        0        4      removed
      5      8      113        5      active sync  /dev/sdh1
      6      8      97        6      active sync  /dev/sdg1
      7      8      81        7      active sync  /dev/sdf1
 
* also das defekte device ist nun 100% "removed!"
* dann reicht ein hinzufügen eines Spare,
 
  mdadm /dev/md127 --add-spare /dev/sdi1
 
* nach dem rebuild - der durch obigen Befehl automatisch startet, da ja ein device "fehlt", wird es automatisch als vollwertiges "U"-Device hinzugefügt!

Version vom 28. Dezember 2017, 15:07 Uhr

  • Server "raib2" im RZ Ubstadt
  • Raid-6 aus 8x 2 TB Platten, dazu habe ich 9x Festplatten gekauft:
    • 2 TB Kapazität - 3,5" Formfaktor - SATA 6Gb/s - 7200 U/min
    • 3 verschiedene Hersteller wegen der vergrößerten Varianz des Zeitpunktes des Ausfall

Serien Nummern

  • WD Purple
    • S/N: WCC4M0SC7C9R
    • S/N: WCC4M0SC7AR1
    • S/N: WCC4M0XEZ7CH
  • SEAGATE BARRACUDA
    • S/N: Z4Z2W81E
    • S/N: Z4Z32SNR
    • S/N: Z4Z2XNWC
  • Toshiba DT01ACA
    • S/N: X5RAD3XGSTZ5
    • S/N: X5RAD2GGSTZ5
    • S/N: Y5GHNDBTSTZ5
  • Das sind 18 TB, das kostet zusammen 684.93 € (Stand Feb 2016).
  • 8 der Platten verwende ich im RAID, eine Platte lege ich daneben für den Fall der Fälle
    • im Lager: Toshiba DT01ACA S/N: X5RAD3XGSTZ5

Block A

Block B

HDD - Cage

Setup Aufbau

Produktiv

Störung vom 28.12.2017

[243680.637402] aacraid: Host adapter abort request (0,2,3,0) [243691.068772] sd 0:2:3:0: [sdi] tag#1 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK [243691.068778] sd 0:2:3:0: [sdi] tag#1 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [243691.068786] blk_update_request: I/O error, dev sdi, sector 2064 [243691.068788] md: super_written gets error=-5 [243691.068793] md/raid:md127: Disk failure on sdi1, disabling device.

               md/raid:md127: Operation continuing on 7 devices.

[243801.115324] aacraid: Host adapter abort request timed out [243801.115334] aacraid: Host adapter abort request (0,2,3,0) [243801.115384] aacraid: Host adapter reset request. SCSI hang ? [243921.593220] aacraid: Host adapter reset request timed out [243921.593230] sd 0:2:3:0: Device offlined - not ready after error recovery [243921.593233] sd 0:2:3:0: Device offlined - not ready after error recovery [243921.593248] sd 0:2:3:0: [sdi] tag#8 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK [243921.593252] sd 0:2:3:0: [sdi] tag#8 CDB: Read(10) 28 00 04 a0 c4 00 00 02 00 00 [243921.593256] blk_update_request: I/O error, dev sdi, sector 77644800 [243921.593289] sd 0:2:3:0: [sdi] tag#11 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK [243921.593292] sd 0:2:3:0: [sdi] tag#11 CDB: Read(10) 28 00 04 a0 c6 00 00 02 00 00 [243921.593294] blk_update_request: I/O error, dev sdi, sector 77645312 [416403.254386] hrtimer: interrupt took 29227 ns [853039.443372] sd 0:2:3:0: rejecting I/O to offline device [853039.443402] sd 0:2:3:0: rejecting I/O to offline device [853039.443411] sd 0:2:3:0: rejecting I/O to offline device [853039.443418] sd 0:2:3:0: rejecting I/O to offline device [853039.443426] sd 0:2:3:0: rejecting I/O to offline device [853039.443433] sd 0:2:3:0: rejecting I/O to offline device [853039.443440] sd 0:2:3:0: rejecting I/O to offline device [853039.443448] sd 0:2:3:0: rejecting I/O to offline device [853039.443455] sd 0:2:3:0: rejecting I/O to offline device [853039.443633] sd 0:2:3:0: rejecting I/O to offline device [853039.443646] sd 0:2:3:0: rejecting I/O to offline device [853039.443653] sd 0:2:3:0: rejecting I/O to offline device [853039.443660] sd 0:2:3:0: rejecting I/O to offline device [853039.443667] sd 0:2:3:0: rejecting I/O to offline device [853039.443674] sd 0:2:3:0: rejecting I/O to offline device [853039.443681] sd 0:2:3:0: rejecting I/O to offline device [853039.443687] sd 0:2:3:0: rejecting I/O to offline device

  • Ich wollte die serial ID der Platte rausfinden, ähm, jedoch hwinfo --disk lieferte bei der Platte nur noch:

28: IDE 23.0: 10600 Disk

 [Created at block.245]
 Unique ID: ipPt.uEhVIzZ7wdA
 Parent ID: B35A.VPIkJrtnW73
 SysFS ID: /class/block/sdi
 SysFS BusID: 0:2:3:0
 SysFS Device Link: /devices/pci0000:00/0000:00:01.1/0000:02:00.0/host0/target0:2:3/0:2:3:0
 Hardware Class: disk
 Model: "WDC WD20PURX-64P"
 Vendor: "WDC"
 Device: "WD20PURX-64P"
 Revision: "0A80"
 Driver: "aacraid", "sd"
 Driver Modules: "aacraid", "sd_mod"
 Device File: /dev/sdi
 Device Files: /dev/sdi, /dev/disk/by-id/scsi-330000d170092e908, /dev/disk/by-id/scsi-SATA_WDC_WD20PURX-64P_WD-WCC4M0XEZ7CH, /dev/disk/by-id/wwn-0x30000d170092e908, /dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0
 Device Number: block 8:128-8:143
 Drive status: no medium
 Config Status: cfg=new, avail=yes, need=no, active=unknown
 Attached to: #15 (Serial Attached SCSI controller)
  • es hätte aber angezeigt werden sollen:

28: IDE 23.0: 10600 Disk

 [Created at block.245]
 Unique ID: ipPt.dZvPpEVVaL9
 Parent ID: B35A.VPIkJrtnW73
 SysFS ID: /class/block/sdi
 SysFS BusID: 0:2:3:0
 SysFS Device Link: /devices/pci0000:00/0000:00:01.1/0000:02:00.0/host0/target0:2:3/0:2:3:0
 Hardware Class: disk
 Model: "WDC WD20PURX-64P"
 Vendor: "WDC"
 Device: "WD20PURX-64P"
 Revision: "0A80"
 Serial ID: "WD-WCC4M0XEZ7CH"
 Driver: "aacraid", "sd"
 Driver Modules: "aacraid", "sd_mod"
 Device File: /dev/sdi
 Device Files: /dev/sdi, /dev/disk/by-id/scsi-330000d170092e908, /dev/disk/by-id/scsi-SATA_WDC_WD20PURX-64P_WD-WCC4M0XEZ7CH, /dev/disk/by-id/wwn-0x30000d170092e908, /dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0
 Device Number: block 8:128-8:143
 Geometry (Logical): CHS 243201/255/63
 Size: 3907029168 sectors a 512 bytes
 Capacity: 1863 GB (2000398934016 bytes)
 Config Status: cfg=new, avail=yes, need=no, active=unknown
 Attached to: #15 (Serial Attached SCSI controller)
  • ich suche also die Platte "WD-WCC4M0XEZ7CH"

raus und ersetzt durch:

28: IDE 23.0: 10600 Disk

 [Created at block.245]
 Unique ID: ipPt.IyRYgsTsxUD
 Parent ID: B35A.VPIkJrtnW73
 SysFS ID: /class/block/sdi
 SysFS BusID: 0:2:3:0
 SysFS Device Link: /devices/pci0000:00/0000:00:01.1/0000:02:00.0/host0/target0:2:3/0:2:3:0
 Hardware Class: disk
 Model: "TOSHIBA DT01ACA2"
 Vendor: "TOSHIBA"
 Device: "DT01ACA2"
 Revision: "ABB0"
 Serial ID: "X5RAD3XGS"
 Driver: "aacraid", "sd"
 Driver Modules: "aacraid", "sd_mod"
 Device File: /dev/sdi
 Device Files: /dev/sdi, /dev/disk/by-id/scsi-330000d170092e908, /dev/disk/by-id/scsi-SATA_TOSHIBA_DT01ACA2_X5RAD3XGS, /dev/disk/by-id/wwn-0x30000d170092e908, /dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0
 Device Number: block 8:128-8:143
 Geometry (Logical): CHS 243201/255/63
 Size: 3907029168 sectors a 512 bytes
 Capacity: 1863 GB (2000398934016 bytes)
 Config Status: cfg=new, avail=yes, need=no, active=unknown
 Attached to: #15 (Serial Attached SCSI controller)
  • ich schaue mal nach wie der Status des Array ist:

raib2:~ # mdadm --detail /dev/md127 /dev/md127:

          Version : 1.2
    Creation Time : Fri Oct 28 11:41:55 2016
       Raid Level : raid6
       Array Size : 11720294400 (11177.34 GiB 12001.58 GB)
    Used Dev Size : 1953382400 (1862.89 GiB 2000.26 GB)
     Raid Devices : 8
    Total Devices : 7
      Persistence : Superblock is persistent
    Intent Bitmap : Internal
      Update Time : Thu Dec 28 14:39:27 2017
            State : clean, degraded
   Active Devices : 7
  Working Devices : 7
   Failed Devices : 0
    Spare Devices : 0
           Layout : left-symmetric
       Chunk Size : 512K

Consistency Policy : bitmap


             Name : raib2:0  (local to host raib2)
             UUID : 500aa0db:5aca5187:5617c3ff:dc97c2c4
           Events : 10316
   Number   Major   Minor   RaidDevice State
      0       8       17        0      active sync   /dev/sdb1
      1       8       33        1      active sync   /dev/sdc1
      2       8       49        2      active sync   /dev/sdd1
      3       8       65        3      active sync   /dev/sde1
      -       0        0        4      removed
      5       8      113        5      active sync   /dev/sdh1
      6       8       97        6      active sync   /dev/sdg1
      7       8       81        7      active sync   /dev/sdf1
 
  • also das defekte device ist nun 100% "removed!"
  • dann reicht ein hinzufügen eines Spare,
 mdadm /dev/md127 --add-spare /dev/sdi1
 
  • nach dem rebuild - der durch obigen Befehl automatisch startet, da ja ein device "fehlt", wird es automatisch als vollwertiges "U"-Device hinzugefügt!