RAID6-2016: Unterschied zwischen den Versionen
Root (Diskussion | Beiträge) |
Root (Diskussion | Beiträge) |
||
Zeile 39: | Zeile 39: | ||
[[Datei:20160202 141516.jpg|350px]] | [[Datei:20160202 141516.jpg|350px]] | ||
== Produktiv == | |||
=== Störung vom 28.12.2017 === | |||
[243680.637402] aacraid: Host adapter abort request (0,2,3,0) | |||
[243691.068772] sd 0:2:3:0: [sdi] tag#1 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK | |||
[243691.068778] sd 0:2:3:0: [sdi] tag#1 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 | |||
[243691.068786] blk_update_request: I/O error, dev sdi, sector 2064 | |||
[243691.068788] md: super_written gets error=-5 | |||
[243691.068793] md/raid:md127: Disk failure on sdi1, disabling device. | |||
md/raid:md127: Operation continuing on 7 devices. | |||
[243801.115324] aacraid: Host adapter abort request timed out | |||
[243801.115334] aacraid: Host adapter abort request (0,2,3,0) | |||
[243801.115384] aacraid: Host adapter reset request. SCSI hang ? | |||
[243921.593220] aacraid: Host adapter reset request timed out | |||
[243921.593230] sd 0:2:3:0: Device offlined - not ready after error recovery | |||
[243921.593233] sd 0:2:3:0: Device offlined - not ready after error recovery | |||
[243921.593248] sd 0:2:3:0: [sdi] tag#8 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK | |||
[243921.593252] sd 0:2:3:0: [sdi] tag#8 CDB: Read(10) 28 00 04 a0 c4 00 00 02 00 00 | |||
[243921.593256] blk_update_request: I/O error, dev sdi, sector 77644800 | |||
[243921.593289] sd 0:2:3:0: [sdi] tag#11 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK | |||
[243921.593292] sd 0:2:3:0: [sdi] tag#11 CDB: Read(10) 28 00 04 a0 c6 00 00 02 00 00 | |||
[243921.593294] blk_update_request: I/O error, dev sdi, sector 77645312 | |||
[416403.254386] hrtimer: interrupt took 29227 ns | |||
[853039.443372] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443402] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443411] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443418] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443426] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443433] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443440] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443448] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443455] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443633] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443646] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443653] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443660] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443667] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443674] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443681] sd 0:2:3:0: rejecting I/O to offline device | |||
[853039.443687] sd 0:2:3:0: rejecting I/O to offline device | |||
* Ich wollte die serial ID der Platte rausfinden, ähm, jedoch hwinfo --disk lieferte bei der Platte nur noch: | |||
28: IDE 23.0: 10600 Disk | |||
[Created at block.245] | |||
Unique ID: ipPt.uEhVIzZ7wdA | |||
Parent ID: B35A.VPIkJrtnW73 | |||
SysFS ID: /class/block/sdi | |||
SysFS BusID: 0:2:3:0 | |||
SysFS Device Link: /devices/pci0000:00/0000:00:01.1/0000:02:00.0/host0/target0:2:3/0:2:3:0 | |||
Hardware Class: disk | |||
Model: "WDC WD20PURX-64P" | |||
Vendor: "WDC" | |||
Device: "WD20PURX-64P" | |||
Revision: "0A80" | |||
Driver: "aacraid", "sd" | |||
Driver Modules: "aacraid", "sd_mod" | |||
Device File: /dev/sdi | |||
Device Files: /dev/sdi, /dev/disk/by-id/scsi-330000d170092e908, /dev/disk/by-id/scsi-SATA_WDC_WD20PURX-64P_WD-WCC4M0XEZ7CH, /dev/disk/by-id/wwn-0x30000d170092e908, /dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0 | |||
Device Number: block 8:128-8:143 | |||
Drive status: no medium | |||
Config Status: cfg=new, avail=yes, need=no, active=unknown | |||
Attached to: #15 (Serial Attached SCSI controller) | |||
* es hätte aber angezeigt werden sollen: | |||
28: IDE 23.0: 10600 Disk | |||
[Created at block.245] | |||
Unique ID: ipPt.dZvPpEVVaL9 | |||
Parent ID: B35A.VPIkJrtnW73 | |||
SysFS ID: /class/block/sdi | |||
SysFS BusID: 0:2:3:0 | |||
SysFS Device Link: /devices/pci0000:00/0000:00:01.1/0000:02:00.0/host0/target0:2:3/0:2:3:0 | |||
Hardware Class: disk | |||
Model: "WDC WD20PURX-64P" | |||
Vendor: "WDC" | |||
Device: "WD20PURX-64P" | |||
Revision: "0A80" | |||
Serial ID: "WD-WCC4M0XEZ7CH" | |||
Driver: "aacraid", "sd" | |||
Driver Modules: "aacraid", "sd_mod" | |||
Device File: /dev/sdi | |||
Device Files: /dev/sdi, /dev/disk/by-id/scsi-330000d170092e908, /dev/disk/by-id/scsi-SATA_WDC_WD20PURX-64P_WD-WCC4M0XEZ7CH, /dev/disk/by-id/wwn-0x30000d170092e908, /dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0 | |||
Device Number: block 8:128-8:143 | |||
Geometry (Logical): CHS 243201/255/63 | |||
Size: 3907029168 sectors a 512 bytes | |||
Capacity: 1863 GB (2000398934016 bytes) | |||
Config Status: cfg=new, avail=yes, need=no, active=unknown | |||
Attached to: #15 (Serial Attached SCSI controller) | |||
* ich suche also die Platte "WD-WCC4M0XEZ7CH" | |||
raus und ersetzt durch: | |||
28: IDE 23.0: 10600 Disk | |||
[Created at block.245] | |||
Unique ID: ipPt.IyRYgsTsxUD | |||
Parent ID: B35A.VPIkJrtnW73 | |||
SysFS ID: /class/block/sdi | |||
SysFS BusID: 0:2:3:0 | |||
SysFS Device Link: /devices/pci0000:00/0000:00:01.1/0000:02:00.0/host0/target0:2:3/0:2:3:0 | |||
Hardware Class: disk | |||
Model: "TOSHIBA DT01ACA2" | |||
Vendor: "TOSHIBA" | |||
Device: "DT01ACA2" | |||
Revision: "ABB0" | |||
Serial ID: "X5RAD3XGS" | |||
Driver: "aacraid", "sd" | |||
Driver Modules: "aacraid", "sd_mod" | |||
Device File: /dev/sdi | |||
Device Files: /dev/sdi, /dev/disk/by-id/scsi-330000d170092e908, /dev/disk/by-id/scsi-SATA_TOSHIBA_DT01ACA2_X5RAD3XGS, /dev/disk/by-id/wwn-0x30000d170092e908, /dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0 | |||
Device Number: block 8:128-8:143 | |||
Geometry (Logical): CHS 243201/255/63 | |||
Size: 3907029168 sectors a 512 bytes | |||
Capacity: 1863 GB (2000398934016 bytes) | |||
Config Status: cfg=new, avail=yes, need=no, active=unknown | |||
Attached to: #15 (Serial Attached SCSI controller) | |||
* ich schaue mal nach wie der Status des Array ist: | |||
raib2:~ # mdadm --detail /dev/md127 | |||
/dev/md127: | |||
Version : 1.2 | |||
Creation Time : Fri Oct 28 11:41:55 2016 | |||
Raid Level : raid6 | |||
Array Size : 11720294400 (11177.34 GiB 12001.58 GB) | |||
Used Dev Size : 1953382400 (1862.89 GiB 2000.26 GB) | |||
Raid Devices : 8 | |||
Total Devices : 7 | |||
Persistence : Superblock is persistent | |||
Intent Bitmap : Internal | |||
Update Time : Thu Dec 28 14:39:27 2017 | |||
State : clean, degraded | |||
Active Devices : 7 | |||
Working Devices : 7 | |||
Failed Devices : 0 | |||
Spare Devices : 0 | |||
Layout : left-symmetric | |||
Chunk Size : 512K | |||
Consistency Policy : bitmap | |||
Name : raib2:0 (local to host raib2) | |||
UUID : 500aa0db:5aca5187:5617c3ff:dc97c2c4 | |||
Events : 10316 | |||
Number Major Minor RaidDevice State | |||
0 8 17 0 active sync /dev/sdb1 | |||
1 8 33 1 active sync /dev/sdc1 | |||
2 8 49 2 active sync /dev/sdd1 | |||
3 8 65 3 active sync /dev/sde1 | |||
- 0 0 4 removed | |||
5 8 113 5 active sync /dev/sdh1 | |||
6 8 97 6 active sync /dev/sdg1 | |||
7 8 81 7 active sync /dev/sdf1 | |||
* also das defekte device ist nun 100% "removed!" | |||
* dann reicht ein hinzufügen eines Spare, | |||
mdadm /dev/md127 --add-spare /dev/sdi1 | |||
* nach dem rebuild - der durch obigen Befehl automatisch startet, da ja ein device "fehlt", wird es automatisch als vollwertiges "U"-Device hinzugefügt! |
Version vom 28. Dezember 2017, 15:07 Uhr
- Server "raib2" im RZ Ubstadt
- Raid-6 aus 8x 2 TB Platten, dazu habe ich 9x Festplatten gekauft:
- 2 TB Kapazität - 3,5" Formfaktor - SATA 6Gb/s - 7200 U/min
- 3 verschiedene Hersteller wegen der vergrößerten Varianz des Zeitpunktes des Ausfall
Serien Nummern
- WD Purple
- S/N: WCC4M0SC7C9R
- S/N: WCC4M0SC7AR1
- S/N: WCC4M0XEZ7CH
- SEAGATE BARRACUDA
- S/N: Z4Z2W81E
- S/N: Z4Z32SNR
- S/N: Z4Z2XNWC
- Toshiba DT01ACA
- S/N: X5RAD3XGSTZ5
- S/N: X5RAD2GGSTZ5
- S/N: Y5GHNDBTSTZ5
- Das sind 18 TB, das kostet zusammen 684.93 € (Stand Feb 2016).
- 8 der Platten verwende ich im RAID, eine Platte lege ich daneben für den Fall der Fälle
- im Lager: Toshiba DT01ACA S/N: X5RAD3XGSTZ5
Block A
Block B
HDD - Cage
Setup Aufbau
Produktiv
Störung vom 28.12.2017
[243680.637402] aacraid: Host adapter abort request (0,2,3,0) [243691.068772] sd 0:2:3:0: [sdi] tag#1 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK [243691.068778] sd 0:2:3:0: [sdi] tag#1 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [243691.068786] blk_update_request: I/O error, dev sdi, sector 2064 [243691.068788] md: super_written gets error=-5 [243691.068793] md/raid:md127: Disk failure on sdi1, disabling device.
md/raid:md127: Operation continuing on 7 devices.
[243801.115324] aacraid: Host adapter abort request timed out [243801.115334] aacraid: Host adapter abort request (0,2,3,0) [243801.115384] aacraid: Host adapter reset request. SCSI hang ? [243921.593220] aacraid: Host adapter reset request timed out [243921.593230] sd 0:2:3:0: Device offlined - not ready after error recovery [243921.593233] sd 0:2:3:0: Device offlined - not ready after error recovery [243921.593248] sd 0:2:3:0: [sdi] tag#8 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK [243921.593252] sd 0:2:3:0: [sdi] tag#8 CDB: Read(10) 28 00 04 a0 c4 00 00 02 00 00 [243921.593256] blk_update_request: I/O error, dev sdi, sector 77644800 [243921.593289] sd 0:2:3:0: [sdi] tag#11 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK [243921.593292] sd 0:2:3:0: [sdi] tag#11 CDB: Read(10) 28 00 04 a0 c6 00 00 02 00 00 [243921.593294] blk_update_request: I/O error, dev sdi, sector 77645312 [416403.254386] hrtimer: interrupt took 29227 ns [853039.443372] sd 0:2:3:0: rejecting I/O to offline device [853039.443402] sd 0:2:3:0: rejecting I/O to offline device [853039.443411] sd 0:2:3:0: rejecting I/O to offline device [853039.443418] sd 0:2:3:0: rejecting I/O to offline device [853039.443426] sd 0:2:3:0: rejecting I/O to offline device [853039.443433] sd 0:2:3:0: rejecting I/O to offline device [853039.443440] sd 0:2:3:0: rejecting I/O to offline device [853039.443448] sd 0:2:3:0: rejecting I/O to offline device [853039.443455] sd 0:2:3:0: rejecting I/O to offline device [853039.443633] sd 0:2:3:0: rejecting I/O to offline device [853039.443646] sd 0:2:3:0: rejecting I/O to offline device [853039.443653] sd 0:2:3:0: rejecting I/O to offline device [853039.443660] sd 0:2:3:0: rejecting I/O to offline device [853039.443667] sd 0:2:3:0: rejecting I/O to offline device [853039.443674] sd 0:2:3:0: rejecting I/O to offline device [853039.443681] sd 0:2:3:0: rejecting I/O to offline device [853039.443687] sd 0:2:3:0: rejecting I/O to offline device
- Ich wollte die serial ID der Platte rausfinden, ähm, jedoch hwinfo --disk lieferte bei der Platte nur noch:
28: IDE 23.0: 10600 Disk
[Created at block.245] Unique ID: ipPt.uEhVIzZ7wdA Parent ID: B35A.VPIkJrtnW73 SysFS ID: /class/block/sdi SysFS BusID: 0:2:3:0 SysFS Device Link: /devices/pci0000:00/0000:00:01.1/0000:02:00.0/host0/target0:2:3/0:2:3:0 Hardware Class: disk Model: "WDC WD20PURX-64P" Vendor: "WDC" Device: "WD20PURX-64P" Revision: "0A80" Driver: "aacraid", "sd" Driver Modules: "aacraid", "sd_mod" Device File: /dev/sdi Device Files: /dev/sdi, /dev/disk/by-id/scsi-330000d170092e908, /dev/disk/by-id/scsi-SATA_WDC_WD20PURX-64P_WD-WCC4M0XEZ7CH, /dev/disk/by-id/wwn-0x30000d170092e908, /dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0 Device Number: block 8:128-8:143 Drive status: no medium Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #15 (Serial Attached SCSI controller)
- es hätte aber angezeigt werden sollen:
28: IDE 23.0: 10600 Disk
[Created at block.245] Unique ID: ipPt.dZvPpEVVaL9 Parent ID: B35A.VPIkJrtnW73 SysFS ID: /class/block/sdi SysFS BusID: 0:2:3:0 SysFS Device Link: /devices/pci0000:00/0000:00:01.1/0000:02:00.0/host0/target0:2:3/0:2:3:0 Hardware Class: disk Model: "WDC WD20PURX-64P" Vendor: "WDC" Device: "WD20PURX-64P" Revision: "0A80" Serial ID: "WD-WCC4M0XEZ7CH" Driver: "aacraid", "sd" Driver Modules: "aacraid", "sd_mod" Device File: /dev/sdi Device Files: /dev/sdi, /dev/disk/by-id/scsi-330000d170092e908, /dev/disk/by-id/scsi-SATA_WDC_WD20PURX-64P_WD-WCC4M0XEZ7CH, /dev/disk/by-id/wwn-0x30000d170092e908, /dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0 Device Number: block 8:128-8:143 Geometry (Logical): CHS 243201/255/63 Size: 3907029168 sectors a 512 bytes Capacity: 1863 GB (2000398934016 bytes) Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #15 (Serial Attached SCSI controller)
- ich suche also die Platte "WD-WCC4M0XEZ7CH"
raus und ersetzt durch:
28: IDE 23.0: 10600 Disk
[Created at block.245] Unique ID: ipPt.IyRYgsTsxUD Parent ID: B35A.VPIkJrtnW73 SysFS ID: /class/block/sdi SysFS BusID: 0:2:3:0 SysFS Device Link: /devices/pci0000:00/0000:00:01.1/0000:02:00.0/host0/target0:2:3/0:2:3:0 Hardware Class: disk Model: "TOSHIBA DT01ACA2" Vendor: "TOSHIBA" Device: "DT01ACA2" Revision: "ABB0" Serial ID: "X5RAD3XGS" Driver: "aacraid", "sd" Driver Modules: "aacraid", "sd_mod" Device File: /dev/sdi Device Files: /dev/sdi, /dev/disk/by-id/scsi-330000d170092e908, /dev/disk/by-id/scsi-SATA_TOSHIBA_DT01ACA2_X5RAD3XGS, /dev/disk/by-id/wwn-0x30000d170092e908, /dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0 Device Number: block 8:128-8:143 Geometry (Logical): CHS 243201/255/63 Size: 3907029168 sectors a 512 bytes Capacity: 1863 GB (2000398934016 bytes) Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #15 (Serial Attached SCSI controller)
- ich schaue mal nach wie der Status des Array ist:
raib2:~ # mdadm --detail /dev/md127 /dev/md127:
Version : 1.2 Creation Time : Fri Oct 28 11:41:55 2016 Raid Level : raid6 Array Size : 11720294400 (11177.34 GiB 12001.58 GB) Used Dev Size : 1953382400 (1862.89 GiB 2000.26 GB) Raid Devices : 8 Total Devices : 7 Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Thu Dec 28 14:39:27 2017 State : clean, degraded Active Devices : 7 Working Devices : 7 Failed Devices : 0 Spare Devices : 0
Layout : left-symmetric Chunk Size : 512K
Consistency Policy : bitmap
Name : raib2:0 (local to host raib2) UUID : 500aa0db:5aca5187:5617c3ff:dc97c2c4 Events : 10316
Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 8 33 1 active sync /dev/sdc1 2 8 49 2 active sync /dev/sdd1 3 8 65 3 active sync /dev/sde1 - 0 0 4 removed 5 8 113 5 active sync /dev/sdh1 6 8 97 6 active sync /dev/sdg1 7 8 81 7 active sync /dev/sdf1
- also das defekte device ist nun 100% "removed!"
- dann reicht ein hinzufügen eines Spare,
mdadm /dev/md127 --add-spare /dev/sdi1
- nach dem rebuild - der durch obigen Befehl automatisch startet, da ja ein device "fehlt", wird es automatisch als vollwertiges "U"-Device hinzugefügt!