마이 클라우드 안에 들어있는 데이터가 필요해요.
마이 클라우드 3TB 제품입니다. 전원은 켜지는데 나스에 접속이 되지 않아요.
root@SERVER-XXXX:/data# smartctl -a /dev/sde smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-11-amd64] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD30EFRX-68EUZN0 Serial Number: WD-WCC4N3ACXXXX LU WWN Device Id: 5 0014ee 261ea070b Firmware Version: 82.00A82 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Fri Aug 26 17:07:18 2022 KST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (38580) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 387) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x703d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 13054 3 Spin_Up_Time 0x0027 179 177 021 Pre-fail Always - 6050 4 Start_Stop_Count 0x0032 001 001 000 Old_age Always - 125453 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 16 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 020 020 000 Old_age Always - 58743 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 201 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 170 193 Load_Cycle_Count 0x0032 159 159 000 Old_age Always - 125540 194 Temperature_Celsius 0x0022 118 094 000 Old_age Always - 32 196 Reallocated_Event_Count 0x0032 187 187 000 Old_age Always - 13 197 Current_Pending_Sector 0x0032 199 199 000 Old_age Always - 753 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 ATA Error Count: 1717 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 1717 occurred at disk power-on lifetime: 58729 hours (2447 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 10 c4 1b e0 Error: UNC 8 sectors at LBA = 0x001bc410 = 1819664 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 10 c4 1b e0 08 1d+06:23:39.128 READ DMA ec 00 00 00 00 00 a0 08 1d+06:23:39.120 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 1d+06:23:39.106 SET FEATURES [Set transfer mode] Error 1716 occurred at disk power-on lifetime: 58729 hours (2447 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 10 c4 1b e0 Error: UNC 8 sectors at LBA = 0x001bc410 = 1819664 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 10 c4 1b e0 08 1d+06:23:35.254 READ DMA ec 00 00 00 00 00 a0 08 1d+06:23:35.246 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 1d+06:23:35.232 SET FEATURES [Set transfer mode] Error 1715 occurred at disk power-on lifetime: 58729 hours (2447 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 10 c4 1b e0 Error: UNC 8 sectors at LBA = 0x001bc410 = 1819664 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 10 c4 1b e0 08 1d+06:23:31.325 READ DMA ec 00 00 00 00 00 a0 08 1d+06:23:31.317 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 1d+06:23:31.303 SET FEATURES [Set transfer mode] Error 1714 occurred at disk power-on lifetime: 58729 hours (2447 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 10 c4 1b e0 Error: UNC 8 sectors at LBA = 0x001bc410 = 1819664 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 10 c4 1b e0 08 1d+06:23:27.529 READ DMA ec 00 00 00 00 00 a0 08 1d+06:23:27.521 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 1d+06:23:27.506 SET FEATURES [Set transfer mode] Error 1713 occurred at disk power-on lifetime: 58729 hours (2447 days + 1 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 10 c4 1b e0 Error: UNC 8 sectors at LBA = 0x001bc410 = 1819664 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 10 c4 1b e0 08 1d+06:23:23.588 READ DMA ec 00 00 00 00 00 a0 08 1d+06:23:23.580 IDENTIFY DEVICE ef 03 46 00 00 00 a0 08 1d+06:23:23.566 SET FEATURES [Set transfer mode] SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 14131 - # 2 Short offline Completed without error 00% 3 - # 3 Short offline Completed without error 00% 0 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@SERVER-XXXX:/data#
하드 디스크의 SMART 정보입니다. 하드 디스크를 사용한 시간(Power_On_Hours)이 58,700 시간(6년 9개월)이고 , 배드섹터(Current_Pending_Sector)가 700개 이상 있으며, 데이터를 읽지 못하는 오류(UNC)가 발생한 것을 알 수 있습니다. 나스에 접속이 되지 않는 이유는 하드 디스크의 배드섹터 때문에 부팅에 필요한 파일을 읽지 못해서 부팅을 완료하지 못했기 때문입니다.
소요시간 : 24시간 어려움 : ★★★★★
하드디스크를 꺼내서 데이터를 복사합니다.
다행히 파일 시스템이 정상적으로 마운트 됩니다. 디스크에서 사용한 공간은 730GB 정도입니다. rsync를 사용해서 파일을 복사합니다.
하드 디스크에 배드섹터가 있는 경우에는 파일 복사가 잘 안 됩니다. 만약 파일 복사 속도가 너무 느리거나 성공적으로 복사되는 파일이 별로 없다면 복사를 즉시 중단하고 전문 복구 업체에 의뢰하는게 바람직합니다. 그런 하드 디스크는 소프트웨어(rsync를 포함해서, 파일을 빠르고 손쉽게 복사해 주는 여러 프로그램)만으로는 복사를 할 수 없는 심각한 상태입니다.
복사가 잘 안 될 때, 계속 복사를 시도해도 되는 경우가 있고 계속 시도하면 하드 디스크에 손상이 점점 더 생기는 경우가 있는데요. 일반 사용자 분들은 이 둘을 구분하기 어렵습니다. 소프트웨어만으로는 복사가 안되는 하드 디스크를 계속해서 사용하면 전문 복구 업체에서도 복구하기가 힘어들어집니다.
마이 클라우드에서 사용하던 하드 디스크는 다행히도 전문 복구 업체에 가지 않아도 되는 상태입니다. 몇 개의 파일을 제외하고는 모두 복사가 되었습니다. 작업에는 하루가 소요되었습니다.
- 하드 디스크 마운트하기.
- rsync로 복사하기.
- 외장 하드 디스크에 다시 복사하기.
root@SERVER-XXXX:~# fdisk -l /dev/sde Disk /dev/sde: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors Disk model: WDC WD30EFRX-68E Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 54A3EFFD-6959-4A61-B03A-056BE10F6995 Device Start End Sectors Size Type /dev/sde1 1032192 5031935 3999744 1.9G Linux RAID /dev/sde2 5031936 9031679 3999744 1.9G Linux RAID /dev/sde3 30720 1032191 1001472 489M Microsoft basic data /dev/sde4 9428992 5860532223 5851103232 2.7T Microsoft basic data /dev/sde5 9031680 9226239 194560 95M Microsoft basic data /dev/sde6 9226240 9422847 196608 96M Microsoft basic data /dev/sde7 9422848 9424895 2048 1M Microsoft basic data /dev/sde8 9424896 9428991 4096 2M Microsoft basic data Partition table entries are not in disk order. root@SERVER-XXXX:~# mount -o ro /dev/sde4 /data/wd3TB root@SERVER-XXXX:/data# df -h Filesystem Size Used Avail Use% Mounted on udev 1.9G 0 1.9G 0% /dev tmpfs 381M 6.4M 375M 2% /run /dev/nvme0n1p2 113G 6.3G 101G 6% / tmpfs 1.9G 0 1.9G 0% /dev/shm tmpfs 5.0M 4.0K 5.0M 1% /run/lock tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup /dev/nvme0n1p1 511M 5.2M 506M 1% /boot/efi /dev/md0 11T 61M 11T 1% /data tmpfs 381M 5.7M 376M 2% /run/user/1000 /dev/sde4 2.7T 732G 2.0T 28% /data/wd3TB root@SERVER-XXXX:/data#
용량이 제일 큰 파티션이 WD My Cloud에서 볼륨으로 사용하는 공간입니다. 732GB를 사용했습니다.
2022/08/26 20:27:13 [4333] Number of files: 208,508 (reg: 201,078, dir: 7,430) 2022/08/26 20:27:13 [4333] Number of created files: 208,508 (reg: 201,078, dir: 7,430) 2022/08/26 20:27:13 [4333] Number of deleted files: 0 2022/08/26 20:27:13 [4333] Number of regular files transferred: 201,105 2022/08/26 20:27:13 [4333] Total file size: 509,861,471,073 bytes 2022/08/26 20:27:13 [4333] Total transferred file size: 517,543,755,110 bytes 2022/08/26 20:27:13 [4333] Literal data: 517,543,755,110 bytes 2022/08/26 20:27:13 [4333] Matched data: 0 bytes 2022/08/26 20:27:13 [4333] File list size: 14,807,736 2022/08/26 20:27:13 [4333] File list generation time: 0.001 seconds 2022/08/26 20:27:13 [4333] File list transfer time: 0.000 seconds 2022/08/26 20:27:13 [4333] Total bytes sent: 517,686,361,274 2022/08/26 20:27:13 [4333] Total bytes received: 3,886,614 2022/08/26 20:27:13 [4333] sent 517,686,361,274 bytes received 3,886,614 bytes 51,056,782.67 bytes/sec 2022/08/26 20:27:13 [4333] total size is 509,861,471,073 speedup is 0.98 2022/08/26 20:27:13 [4333] rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1207) [sender=3.1.3]
2022/08/26 21:49:43 [4899] Number of files: 24,579 (reg: 23,538, dir: 1,041) 2022/08/26 21:49:43 [4899] Number of created files: 24,578 (reg: 23,538, dir: 1,040) 2022/08/26 21:49:43 [4899] Number of deleted files: 0 2022/08/26 21:49:43 [4899] Number of regular files transferred: 23,538 2022/08/26 21:49:43 [4899] Total file size: 274,624,953,389 bytes 2022/08/26 21:49:43 [4899] Total transferred file size: 274,624,953,389 bytes 2022/08/26 21:49:43 [4899] Literal data: 274,624,953,389 bytes 2022/08/26 21:49:43 [4899] Matched data: 0 bytes 2022/08/26 21:49:43 [4899] File list size: 2,293,225 2022/08/26 21:49:43 [4899] File list generation time: 0.001 seconds 2022/08/26 21:49:43 [4899] File list transfer time: 0.000 seconds 2022/08/26 21:49:43 [4899] Total bytes sent: 274,694,185,027 2022/08/26 21:49:43 [4899] Total bytes received: 454,477 2022/08/26 21:49:43 [4899] sent 274,694,185,027 bytes received 454,477 bytes 97,669,205.16 bytes/sec 2022/08/26 21:49:43 [4899] total size is 274,624,953,389 speedup is 1.00
마이 클라우드에서 복사한 파일을 외장 하드디스크에 넣어서 고객님께 전달해 드립니다.
궁금하면 ↓↓↓↓
① 2022/08/26 19:48:12 [4333] >f+++++++++ dir_1/dir_2/file_1.zip ② 2022/08/26 19:48:12 [4333] rsync: read errors mapping "/data/wd3TB/shares/dir_1/dir_2/file_1.zip": Input/output error (5) ③ 2022/08/26 19:48:53 [4333] WARNING: dir_1/dir_2/file_1.zip failed verification -- update discarded (will try again). ④ 2022/08/26 19:49:37 [4333] >f+++++++++ dir_1/dir_2/file_1.zip
① 2022/08/26 18:28:52 [4333] >f+++++++++ dir_1/dir_2/file_2.pdf ② 2022/08/26 18:28:52 [4333] rsync: read errors mapping "/data/wd3TB/shares/dir_1/dir_2/file_2.pdf": Input/output error (5) ③ 2022/08/26 18:31:10 [4333] WARNING: dir_1/dir_2/file_2.pdf failed verification -- update discarded (will try again). ④ 2022/08/26 18:31:37 [4333] >f+++++++++ dir_1/dir_2/file_2.pdf ⑤ 2022/08/26 18:31:37 [4333] rsync: read errors mapping "/data/wd3TB/shares/dir_1/dir_2/file_2.pdf": Input/output error (5) ⑥ 2022/08/26 18:33:26 [4333] ERROR: dir_1/dir_2/file_2.pdf failed verification -- update discarded.
rsync가 파일 복사에 실패하는 경우에는 복사를 다시 시도합니다. 다시 시도해서 복사에 성공하는 경우와 다시 시도했지만 여전히 실패하는 경우가 있는데요. 로그를 보면 결과를 알 수 있습니다. 위는 성공한 경우, 아래는 실패한 경우입니다.
- 파일을 전송합니다.
- 원본 파일을 읽지 못합니다.
- 복사에 실패합니다. 하지만 다시 시도할 계획입니다.(will try again)
- 몇 분 후에 파일을 다시 전송합니다.
다시 시도해서 성공한 경우는 로그가 여기에서 끝납니다. 다시 시도했지만 실패한 경우는 5번, 6번 로그가 더 발생합니다. - 이번에도 원본 파일을 읽지 못합니다.
- 복사에 실패합니다. 여기에는 (will try again)이 없습니다. 복사를 다시 시도하지 않는다는 의미 같습니다. 이후에는 동일한 파일을 전송한 기록이 없습니다. 또한 로그 레벨이 WARNING가 아니라 ERROR입니다.
root@SERVER-XXXX:~# smartctl -A /dev/sde smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-11-amd64] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 15406 3 Spin_Up_Time 0x0027 178 177 021 Pre-fail Always - 6075 4 Start_Stop_Count 0x0032 001 001 000 Old_age Always - 125454 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 19 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 020 020 000 Old_age Always - 58748 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 202 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 170 193 Load_Cycle_Count 0x0032 159 159 000 Old_age Always - 125545 194 Temperature_Celsius 0x0022 120 094 000 Old_age Always - 30 196 Reallocated_Event_Count 0x0032 184 184 000 Old_age Always - 16 197 Current_Pending_Sector 0x0032 198 198 000 Old_age Always - 1085 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 root@SERVER-XXXX:~#
파일을 복사한 이후에는 배드 섹터 개수가 증가 합니다. 3TB 디스크의 30%정도(복사한 720GB 용량 만큼)를 스캔 했으므로 당연히 배드 섹터 개수는 증가(생성 또는 발견)하게 됩니다.
참고
- rsync(1) manpage
https://download.samba.org/pub/rsync/rsync.1 - HOWTO read smartctl reports
https://www.smartmontools.org/wiki/Howto_ReadSmartctlReports_ATA_542.1