Recently a customer had to replace a disk in the ReadyData 5200. How ever the disk came online but wasn’t available for rebuilding of the raid array. I needed to contact L3 support for further help. The fix they supplied was easier than I thought, I followed the commands which the engineer used.
You need to logon to the ReadyData device with SSH, use the user root and the current admin password to login.
When you are logged in you see the following:
Last login: Tue Sep 20 15:01:26 2016 from computername.lan.local. root@readydata:~#
To see the current status of the volume you need to use the following command:
root@readydata:~# zpool status pool: VOLUME01 state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scan: scrub canceled on Tue Oct 11 09:09:36 2016 config: NAME STATE READ WRITE CKSUM VOLUME01 DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 c0t5000C5004D0B838Bd0 ONLINE 0 0 0 c0t5000C50079ECD105d0 ONLINE 0 0 0 c0t5000C50079B4212Bd0 ONLINE 0 0 0 c0t5000C50079DBDC1Cd0 ONLINE 0 0 0 1002758886951245814 UNAVAIL 0 0 0 was /dev/dsk/c0t5000C50079DDBB70d0s0 c0t5000C50079EC938Ad0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scan: resilvered 275M in 0h0m with 0 errors on Thu Apr 30 15:39:10 2015 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 /VOLUME01/._system/VOLUME01 ONLINE 0 0 0 errors: No known data errors
The next step is that you find out what the drive information is which is unavailable for rebuilding. You can do this with the following command:
root@readydata:~# get_disk_info -h Device: c0t5000C5007A9BBB38d0 Channel: 4 Controller: 0 Model: ST3000NM0033-9ZM Serial: Z1Z80QGD Firmware: SN04 Class: SATA RPM: 7200 Sector size: 512 Sectors: 5860533168 Pool: PoolType: PoolState: 0 PoolHostId: 0 ATA Error Count: 0 SMART Data: Reallocated Sectors: 0 Spin Retry Count: 0 End-to-End Errors: 0 Command Timeouts: 0 Current Pending Sector Count: 0 Uncorrectable Sector Count: 0 Temperature: 29 Start/Stop Count: 2 Power-On Hours: 448 Power Cycle Count: 2 Load Cycle Count: 20
Find the disk you need, you can see that the disk you need isn’t assigned to a pool.
To add the disk to the raid set for rebuilding you need to use the following command.
root@Readydata:~# zpool replace VOLUME01 <pool number (disk unavailable)> <device number>
When you did this you can check the status of the pool and also see that the raid set is starting to rebuild.
Youroot@Readydata:~# zpool status pool: VOLUME01 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Oct 11 10:25:38 2016 90.4M scanned out of 13.2T at 2.66M/s, 1444h26m to go 15.0M resilvered, 0.00% done config: NAME STATE READ WRITE CKSUM VOLUME01 DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 c0t5000C5004D0B838Bd0 ONLINE 0 0 0 c0t5000C50079ECD105d0 ONLINE 0 0 0 c0t5000C50079B4212Bd0 ONLINE 0 0 0 c0t5000C50079DBDC1Cd0 ONLINE 0 0 0 replacing-4 UNAVAIL 0 0 0 1002758886951245814 UNAVAIL 0 0 0 was /dev/dsk/c0t5000C50079DDBB70d0s0 c0t5000C5007A9BBB38d0 ONLINE 0 0 0 (resilvering) c0t5000C50079EC938Ad0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scan: resilvered 275M in 0h0m with 0 errors on Thu Apr 30 15:39:10 2015 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 /VOLUME01/._system/VOLUME01 ONLINE 0 0 0 errors: No known data errors
Now yo can see in the dashboard of the Readydata (Webpage) that it started to rebuild on the command line you can see how long this will take.
This can take a while. If you are not sure about what you doing contact the support of the vendor.
Good article and a slight update for the latest version of the ReadyDATA O/S (1.4.5). Swapping an unsigned disk now causes the pool to automatically start the rebuild process which is good. However it runs in to a zfs bug where the resilver process runs, completes but never removes the old disk from the pool so it will stay in a DEGRADED state. To resolve this you have to run a zpool status to show the “replacing-x” which will give you output similar to this:
replacing-3 DEGRADED 0 0 0
17990103347270912345 UNAVAIL 0 0 0 was /dev/dsk/c0t50014EE263247AF12345
c0t50014EE264FDBA12a3 ONLINE 0 0 0
To complete get ZFS to complete the process (and the pool return to an ONLINE state) the following command needs to be issued:
zpool detatch