====== Recovering my lost data: LVM and RAID ======
So I upgraded Ubuntu 9.10 to Ubuntu 11.10. When the system boots it says it cannot mount ''/store'', my 960GB RAID+LVM file-system. That the one that holds over 10 years of personal photographs and such. :-(
===== About the file-system: ''store'' =====
There are many layers of indirection between the file-system and the physical storage when using LVM or RAID. When using both, the number of layers can seem excessive. Here's a diagram of the layers involved in my (lost) setup:
digraph G {
node [shape=box]
md0 [label="RAID5 block device: md0"]
sdb1 [label="AutoRAID partition: sdb1"]
sdc1 [label="AutoRAID partition: sdc1"]
sdd1 [label="AutoRAID partition: sdd1"]
sde1 [label="AutoRAID partition: sde1"]
sdf1 [label="AutoRAID partition: sdf1"]
sdb [label="Disk: 320GB SATA: sdb"]
sdc [label="Disk: 320GB SATA: sdc"]
sdd [label="Disk: 320GB SATA: sdd"]
sde [label="Disk: 320GB SATA: sde"]
sdf [label="Disk: 320GB SATA: sdf"]
fs_store [label="EXT4 file-system: store"]
lv_store [label="LVM logical volume: lv_store"]
vg_store [label="LVM volume group: vg_store"]
pv_store [label="LVM physical volume: pv_store"]
sdb1 -> sdb
sdc1 -> sdc
sdd1 -> sdd
sde1 -> sde
sdf1 -> sdf
md0 -> {sdb1 sdc1 sdd1 sde1 sdf1}
pv_store -> md0
vg_store -> pv_store
lv_store -> vg_store
fs_store -> lv_store
}
Note that the RAID block device, ''md0'', is not partitioned. I believe that was a mistake on my part, and a likely reason why Ubuntu 11.10 cannot auto-detect it.
===== Problem statement =====
Since upgrading system boot is interrupted with an error screen to the affect of "Cannot mount /store" and a prompt to enter a root shell or skip mounting. From what I can see, the RAID array is detected without problems, and is functioning correctly. So the system looks like this:
digraph G {
node [shape=box]
md0 [label="RAID5 block device: md0 (appears to be unformatted)"]
sdb1 [label="AutoRAID partition: sdb1"]
sdc1 [label="AutoRAID partition: sdc1"]
sdd1 [label="AutoRAID partition: sdd1"]
sde1 [label="AutoRAID partition: sde1"]
sdf1 [label="AutoRAID partition: sdf1"]
sdb [label="Disk: 320GB SATA: sdb"]
sdc [label="Disk: 320GB SATA: sdc"]
sdd [label="Disk: 320GB SATA: sdd"]
sde [label="Disk: 320GB SATA: sde"]
sdf [label="Disk: 320GB SATA: sdf"]
sdb1 -> sdb
sdc1 -> sdc
sdd1 -> sdd
sde1 -> sde
sdf1 -> sdf
md0 -> {sdb1 sdc1 sdd1 sde1 sdf1}
}
The RAID (multi-disk) status looks fine to me:
root@ikari:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : active raid5 sdb[1] sde[3] sdc[0] sdd[2] sdf[4](S)
937713408 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
unused devices:
but the resulting 960.2 GB block device is partitioned as a "Linux RAID autodetect" - which would suggest that it is a *member* of some other multi-disk setup. This, I believe, is human error on my part when I created the thing...
root@ikari:~# fdisk -l /dev/md127
Disk /dev/md127: 960.2 GB, 960218529792 bytes
255 heads, 63 sectors/track, 116739 cylinders, total 1875426816 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 196608 bytes
Disk identifier: 0xd71c877b
Device Boot Start End Blocks Id System
/dev/md127p1 63 625137344 312568641 fd Linux RAID autodetect
Partition 1 does not start on physical sector boundary.
====== Recovery strategy ======
- Create a disk-image of ''md127'', partition table and all (required buying an external 2TB USB drive)
- Use LVM snapshots with this disk-image to make (and quickly roll-back) experimental changes
===== Formatting the external USB drive =====
So I created a single "Linux LVM" partition on the 2TB disk, created a single 1.8TB physical volume and a single 1.8TB volume group containing it. On this I created a 1TB logical volume called ''lv_scratch'' and copied the contents of ''md127'' to it (e.g. ''dd if=/dev/md127 of=/dev/mapper/lv_scratch''). Once the copy was made, I created a snapshot of ''lv_scratch'' which I imaginatively called ''snap''.
LVM snapshots are interesting creatures. As the name suggests, the snapshot (named ''snap'') holds the state of ''lv_scratch'' as it was when I created it. I can still read and write to ''lv_scratch'', but the contents of ''snap'' will not change. This is ideal for making consistent backups. The snapshot works by deferring any writes to ''lv_scratch'' and placing them instead in some temporary copy-on-write (COW) volume. All access to ''lv_scratch'' consults the COW volume - when there is a hit it is returned, otherwise the original (unchanged) ''lv_scratch'' is read. When the snapshot is deleted, the deferred changes stored in ''snap'' are written to ''lv_scratch'' and become permanent. Makes sense if you are used to copy-on-write behaviour.
Now here is where things get interesting. The snapshot, ''snap'', does not have to be read-only: you can create it read-write. Doing so gives you a very cheap copy of ''lv_scratch'', and any changes you make to the snapshot are stored in a COW table. You can discard the changes by deleting the snapshot. Ideal for my situation: I want to experiment with the partition table and various file-system recovery tools etc. I let these manipulate the snapshot, and if things go bad I delete and recreate the snapshot and try over.
digraph G {
sdj [label="2TB USB disk: sdj"]
sdj1 [label="Linux LVM partition: sdj1"]
pv_scratch [label="LVM physical volume"]
vg_scratch [label="LVM volume group: vg_scratch"]
lv_scratch [label="LVM logical volume: lv_scratch"]
snap [label="LVM logical volume: snap"]
lv_scratch -> vg_scratch -> pv_scratch -> sdj1 -> sdj
snap -> vg_scratch
snap -> lv_scratch [style="dashed",arrowhead="none"]
}
root@ikari:~# fdisk -l /dev/sdj
Disk /dev/sdj: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000f0222
Device Boot Start End Blocks Id System
/dev/sdj1 2048 3907028991 1953513472 8e Linux LVM
root@ikari:~# pvdisplay
/dev/dm-0: read failed after 0 of 4096 at 0: Input/output error
/dev/dm-1: read failed after 0 of 4096 at 0: Input/output error
/dev/dm-2: read failed after 0 of 4096 at 0: Input/output error
/dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
--- Physical volume ---
PV Name /dev/sdj1
VG Name vg_scratch
PV Size 1.82 TiB / not usable 4.00 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 476931
Free PE 86787
Allocated PE 390144
PV UUID nrf9cQ-Asfz-Y2x2-SDoT-3ppu-mpEC-Fnuf8Z
root@ikari:~# vgdisplay
/dev/dm-0: read failed after 0 of 4096 at 0: Input/output error
/dev/dm-1: read failed after 0 of 4096 at 0: Input/output error
/dev/dm-2: read failed after 0 of 4096 at 0: Input/output error
/dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
--- Volume group ---
VG Name vg_scratch
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 13
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 1.82 TiB
PE Size 4.00 MiB
Total PE 476931
Alloc PE / Size 390144 / 1.49 TiB
Free PE / Size 86787 / 339.01 GiB
VG UUID Lk7UZP-48xF-vBPi-6g8F-sXlF-qyzy-pQNKgq
root@ikari:~# lvdisplay
/dev/dm-0: read failed after 0 of 4096 at 0: Input/output error
/dev/dm-1: read failed after 0 of 4096 at 0: Input/output error
/dev/dm-2: read failed after 0 of 4096 at 0: Input/output error
/dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
--- Logical volume ---
LV Name /dev/vg_scratch/lv_scratch
VG Name vg_scratch
LV UUID aFBpgv-gqcd-jjLU-c7xO-Jyeb-2R0t-HpEF84
LV Write Access read/write
LV snapshot status source of
/dev/vg_scratch/snap [active]
LV Status available
# open 0
LV Size 1.00 TiB
Current LE 262144
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:1
--- Logical volume ---
LV Name /dev/vg_scratch/snap
VG Name vg_scratch
LV UUID OvOsQ7-uACi-xJVZ-vseu-fKEc-F73h-CmSalH
LV Write Access read/write
LV snapshot status active destination for /dev/vg_scratch/lv_scratch
LV Status available
# open 0
LV Size 1.00 TiB
Current LE 262144
COW-table size 500.00 GiB
COW-table LE 128000
Allocated to snapshot 0.00%
Snapshot chunk size 4.00 KiB
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:3
So here's the goal I'm aiming for on my external storage:
digraph G {
sdj [label="2TB USB disk: sdj"]
sdj1 [label="Linux LVM partition: sdj1"]
pv_scratch [label="LVM physical volume"]
vg_scratch [label="LVM volume group: vg_scratch"]
lv_scratch [label="LVM logical volume: lv_scratch"]
snap [label="LVM logical volume: snap"]
lv_scratch -> vg_scratch -> pv_scratch -> sdj1 -> sdj
snap -> vg_scratch
snap -> lv_scratch [style="dashed",arrowhead="none"]
node [shape=box]
fs_store [label="EXT4 file-system: store"]
lv_store [label="LVM logical volume: lv_store"]
vg_store [label="LVM volume group: vg_store"]
pv_store [label="LVM physical volume: pv_store"]
pv_store -> snap
vg_store -> pv_store
lv_store -> vg_store
fs_store -> lv_store
}
====== Recognising the nested LVM volumes ======
Since it's been a few days and reboots since I last worked on this, I'll start by plugging the USB drive it.
root@ikari:~# dmesg
[ 479.180019] usb 2-5: new high speed USB device number 7 using ehci_hcd
[ 479.313228] scsi13 : usb-storage 2-5:1.0
[ 480.312605] scsi 13:0:0:0: Direct-Access Seagate Desktop 0130 PQ: 0 ANSI: 4
[ 480.336633] sd 13:0:0:0: Attached scsi generic sg10 type 0
[ 480.337029] sd 13:0:0:0: [sdi] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[ 480.337671] sd 13:0:0:0: [sdi] Write Protect is off
[ 480.337671] sd 13:0:0:0: [sdi] Mode Sense: 2f 08 00 00
[ 480.340027] sd 13:0:0:0: [sdi] No Caching mode page present
[ 480.340027] sd 13:0:0:0: [sdi] Assuming drive cache: write through
[ 480.341806] sd 13:0:0:0: [sdi] No Caching mode page present
[ 480.341811] sd 13:0:0:0: [sdi] Assuming drive cache: write through
[ 480.357290] sdi: sdi1
[ 480.359346] sd 13:0:0:0: [sdi] No Caching mode page present
[ 480.359350] sd 13:0:0:0: [sdi] Assuming drive cache: write through
[ 480.359354] sd 13:0:0:0: [sdi] Attached SCSI disk
The (outer) LVM PVs are automatically detects, and their VGs + LVs are subsequently detected:
root@ikari:~# pvs
PV VG Fmt Attr PSize PFree
/dev/sdi1 vg_scratch lvm2 a- 1.82t 339.01g
root@ikari:~# vgs
VG #PV #LV #SN Attr VSize VFree
vg_scratch 1 2 1 wz--n- 1.82t 339.01g
root@ikari:~# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
lv_scratch vg_scratch owi-a- 1.00t
snap vg_scratch swi-a- 500.00g lv_scratch 0.00
Somewhere on the ''snap'' logcial volume is my nested LVM. I used ''xxd /dev/vg_scratch/snap | less'' and searched for ''LVM2''. The first hit was a false-positive (appeared to have stripes of NULLs written across it), but the second hit looked plausible:
8018600: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018610: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018620: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018630: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018640: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018650: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018660: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018670: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018680: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018690: 0000 0000 0000 0000 0000 0000 0000 0000 ................
80186a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
80186b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
80186c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
80186d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
80186e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
80186f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018700: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018710: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018720: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018730: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018740: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018750: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018760: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018770: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018780: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018790: 0000 0000 0000 0000 0000 0000 0000 0000 ................
80187a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
80187b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
80187c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
80187d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
80187e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
80187f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018800: 4c41 4245 4c4f 4e45 0100 0000 0000 0000 LABELONE........
8018810: 9148 4053 2000 0000 4c56 4d32 2030 3031 .H@S ...LVM2 001
8018820: 5341 7536 6e32 7578 474c 5148 6743 5351 SAu6n2uxGLQHgCSQ
8018830: 6b56 6b5a 655a 4c78 7874 314b 7652 6a31 kVkZeZLxxt1KvRj1
8018840: 00f8 0391 df00 0000 0000 0300 0000 0000 ................
8018850: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018860: 0000 0000 0000 0000 0010 0000 0000 0000 ................
8018870: 00f0 0200 0000 0000 0000 0000 0000 0000 ................
8018880: 0000 0000 0000 0000 0000 0000 0000 0000 ................
8018890: 0000 0000 0000 0000 0000 0000 0000 0000 ................
80188a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
I know from using ''xxd'' to examine a correct and functioning LVM2 partition (the PV behind ''vg_scratch'' as it happens) that the "LVM2" text should appear at 0x210. So I'll create a loopback device with an appropriate offset to make that happen:
root@ikari:~# losetup /dev/loop0 /dev/vg_scratch/snap --offset $((0x8018600))
root@ikari:~# lvmdiskscan
/dev/ram0 [ 64.00 MiB]
/dev/loop0 [ 894.15 GiB] LVM physical volume
/dev/dm-0 [ 186.27 GiB]
/dev/ram1 [ 64.00 MiB]
/dev/sda1 [ 294.09 GiB]
/dev/dm-1 [ 894.27 GiB]
/dev/ram2 [ 64.00 MiB]
/dev/dm-2 [ 894.27 GiB]
/dev/ram3 [ 64.00 MiB]
/dev/dm-3 [ 894.27 GiB]
/dev/ram4 [ 64.00 MiB]
/dev/dm-4 [ 782.47 GiB]
/dev/ram5 [ 64.00 MiB]
/dev/sda5 [ 4.00 GiB]
/dev/dm-5 [ 715.38 GiB]
/dev/ram6 [ 64.00 MiB]
/dev/ram7 [ 64.00 MiB]
/dev/ram8 [ 64.00 MiB]
/dev/ram9 [ 64.00 MiB]
/dev/ram10 [ 64.00 MiB]
/dev/ram11 [ 64.00 MiB]
/dev/ram12 [ 64.00 MiB]
/dev/ram13 [ 64.00 MiB]
/dev/ram14 [ 64.00 MiB]
/dev/ram15 [ 64.00 MiB]
/dev/sdb1 [ 1.82 TiB] LVM physical volume
0 disks
24 partitions
0 LVM physical volume whole disks
2 LVM physical volumes
root@ikari:~# pvs
PV VG Fmt Attr PSize PFree
/dev/loop0 store_vg lvm2 a- 894.25g 178.88g
/dev/sdb1 vg_scratch lvm2 a- 1.82t 0
If ''lvmdiskscan'' doesn work you could try using ''partprobe'' to tell the Kernel to rescan partition tables and do what it does.
root@ikari:~# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
store_lv store_vg -wi-a- 715.38g
home_zfs vg_scratch -wi-a- 186.27g
lv_scratch vg_scratch owi-a- 894.27g
snap vg_scratch swi-ao 782.47g lv_scratch 0.00
====== Finding the ext4 file-system ======
OK, so now I can read/write my ''store'' LVM that housed my ''ext4'' file-system. But the file-system isn't where it's supposed to be, apparently:
root@ikari:~# file -Ls /dev/store_vg/store_lv
/dev/store_vg/store_lv: data
Time to play with a hex editor again to look at a valid ''ext4'' file-system header (from my current Ubuntu installation, ''/dev/sda1'').
After a bit of searching I find the characteristic ''53ef'' marker in the right column, followed by the name of my file-system: ''store''. Looks good.
# xxd /dev/store_vg/store_lv | less
05c9db0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
05c9dc0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
05c9dd0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
05c9de0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
05c9df0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
05c9e00: 00c0 fc06 c017 f90d c9da b200 0c2a d00a .............*..
05c9e10: c604 f906 0000 0000 0200 0000 0200 0000 ................
05c9e20: 0080 0000 0080 0000 0040 0000 9d17 aa4d .........@.....M
05c9e30: 9d17 aa4d 0200 1b00 53ef 0100 0100 0000 ...M....S.......
05c9e40: 4c2c a24d 004e ed00 0000 0000 0100 0000 L,.M.N..........
05c9e50: 0000 0000 0b00 0000 8000 0000 3400 0000 ............4...
05c9e60: 0600 0000 0300 0000 9495 5e9b 7d7e 41da ..........^.}~A.
05c9e70: a595 6af4 4848 b5c6 7374 6f72 6500 0000 ..j.HH..store...
05c9e80: 0000 0000 0000 0000 0000 0000 0000 0000 ................
05c9e90: 0000 0000 0000 0000 0000 0000 0000 0000 ................
05c9ea0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
05c9eb0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
05c9ec0: 0000 0000 0000 0000 0000 0000 0000 c803 ................
05c9ed0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
05c9ee0: 0800 0000 0000 0000 0000 0000 9a4a cd41 .............J.A
05c9ef0: 88ff 4759 ac42 d083 8b3f 3b0d 0201 0000 ..GY.B...?;.....
05c9f00: 0000 0000 0000 0000 cf4a 9f46 0906 0000 .........J.F....
05c9f10: 0a06 0000 0b06 0000 0c06 0000 0d06 0000 ................
By comparing this to my valid file-system, I can see that the ''53ef'' line should be at offset 0x430:
# xxd /dev/sda1 | less
0000380: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000390: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00003f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000400: 0020 2601 005d 9804 73d1 3a00 8366 bb02 . &..]..s.:..f..
0000410: d8da 2001 0000 0000 0200 0000 0200 0000 .. .............
0000420: 0080 0000 0080 0000 0020 0000 b153 264f ......... ...S&O
0000430: d9bf f44e 0d00 2200 53ef 0100 0100 0000 ...N..".S.......
0000440: 98b4 f44e 004e ed00 0000 0000 0100 0000 ...N.N..........
0000450: 0000 0000 0b00 0000 0001 0000 3c00 0000 ............<...
0000460: 4602 0000 7b00 0000 89bb eae1 b864 492d F...{........dI-
0000470: a3d6 2bc9 5336 151e 0000 0000 0000 0000 ..+.S6..........
0000480: 0000 0000 0000 0000 2f00 1eae 7c13 0000 ......../...|...
0000490: 0000 c099 98a1 0188 ffff 985d 698f 0188 ...........]i...
00004a0: ffff 307a 87a2 0188 ffff 307a 87a2 0188 ..0z......0z....
00004b0: ffff 585c 698f 0188 ffff 645d 1681 ffff ..X\i.....d]....
00004c0: ffff c000 588f 0188 0000 0000 0000 ed03 ....X...........
00004d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00004e0: 0800 0000 0000 0000 b60a 4000 5547 eae9 ..........@.UG..
00004f0: 2f3c 45ee 9913 70fb 20b7 7395 0101 0000 /
So let's create another loopback device with an offset:
# losetup /dev/loop1 /dev/store_vg/store_lv --offset $((0x5C9A00))
For the record, that makes the current overall loopback settings:
# losetup -a
/dev/loop0: [0005]:12783 (/dev/mapper/vg_scratch-snap), offset 134317568
/dev/loop1: [0005]:40851 (/dev/mapper/store_vg-store_lv), offset 6068736
So is the ''ext4'' file-system visible?
# file -Ls /dev/loop1
/dev/loop1: Linux rev 1.0 ext3 filesystem data, UUID=94955e9b-7d7e-41da-a595-6af44848b5c6, volume name "store" (needs journal recovery) (large files)
Success!
**Update**: I ended up writing a Python script, [[https://github.com/meermanr/ext3_recovery|find_ext3.py]], to help me locate ''ext3'' superblocks, and automatically check their validity using ''dumpe2fs''. This showed that the most common file-system origin is actually _before_ the start of the LVM logical volume:
root@skuld:/home/meermanr/projects/find_ext3# cut -d' ' -f9- store_vg-store_lv.find_ext3.log | sort | uniq
-c | sort -rn | head
17 origin -134282240
16 origin -134514176
1 origin 8382976
1 origin 8268288
1 origin 8256000
1 origin 8215040
1 origin 8145408
1 origin 8133120
1 origin 8043008
1 origin 8030720
From this I conclude that my original (working) installation was not actually using the Logical Volume! This may explain why updating Ubuntu to a version which has LVM support by default made my system unable to find the file-system.
So from this point on I'll ignore the Logical Volumes in my disk image (''store_vg/store_lv''), and instead look for a file-system in the raw image.
====== Getting my data back ======
Obviously I tried mounting it first, to no avail:
root@ikari:/tmp# mount /dev/loop1 /tmp/store
mount: wrong fs type, bad option, bad superblock on /dev/loop1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
Since this is all sitting on top of the LVM snapshot I made earlier (''/dev/vg_scratch/snap'') I am happy to try various tools that modify the drive contents, such as ''fsck''.
First attempt, no joy:
# fsck.ext4 -y /dev/loop1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Group descriptors look bad... trying backup blocks...
fsck.ext4: Bad magic number in super-block when using the backup blocks
fsck.ext4: going back to original superblock
Error reading block 3226742528 (Invalid argument). Ignore error? yes
Force rewrite? yes
Superblock has an invalid journal (inode 8).
Clear? yes
*** ext3 journal has been deleted - filesystem is now ext2 only ***
The filesystem size (according to the superblock) is 234428352 blocks
The physical size of the device is 187529782 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort? yes
Error writing block 3226742528 (Invalid argument). Ignore error? yes
Using my [[https://github.com/meermanr/ext3_recovery|find_ext3.py]] script I mapped out the location of all superblocks on my disk in the form of a log file as follows:
# meermanr@Ikari:/home/meermanr/projects/find_ext3 (master)
# head vg_scratch-snap.log
OK /dev/vg_scratch/snap 1024 store #0 4096 kB 32768 bpg, origin 0
BAD /dev/vg_scratch/snap 6358016 store #0 4096 kB 32768 bpg, origin 6356992
BAD /dev/vg_scratch/snap 6403072 store #0 4096 kB 32768 bpg, origin 6402048
BAD /dev/vg_scratch/snap 6444032 store #0 4096 kB 32768 bpg, origin 6443008
BAD /dev/vg_scratch/snap 6456320 store #0 4096 kB 32768 bpg, origin 6455296
BAD /dev/vg_scratch/snap 6525952 store #0 4096 kB 32768 bpg, origin 6524928
BAD /dev/vg_scratch/snap 6562816 store #0 4096 kB 32768 bpg, origin 6561792
BAD /dev/vg_scratch/snap 6640640 store #0 4096 kB 32768 bpg, origin 6639616
BAD /dev/vg_scratch/snap 6652928 store #0 4096 kB 32768 bpg, origin 6651904
BAD /dev/vg_scratch/snap 6722560 store #0 4096 kB 32768 bpg, origin 6721536
Obviously the "OK" superblock at origin 0 isn't actually valid, according to ''fsck''. I suspect I've reformatted this drive a number of times, and unfortunately used the same label (''store'') each time, which doesn't help. So let's take a statistical approach: tally how often each origin offset is mentioned, and investigate the most frequent:
# meermanr@Ikari:/home/meermanr/projects/find_ext3 (master)
# cut -d' ' -f9- ./vg_scratch-snap.log | sort | uniq -c | sort -rn | head
22 origin 231936
19 origin 0
1 origin 880602744320
1 origin 8802304
1 origin 8790016
1 origin 8679424
1 origin 8667136
1 origin 8556544
1 origin 8544256
1 origin 8433664
As above, I mounted this on a loop back device and ran ''fsck'' on it. This got a lot further than previous attempts: more text scrolled past me, and it sat there for a while writing out a new journal. Ultimately it gave up with the same error as last time.
On the advice of [[http://forums.gentoo.org/viewtopic-p-3778374.html#3778374|Gentoo forum post]] I ran
mke2fs -S /dev/loop0
followed by ''fsck''. That was last night. This morning it's still running, pinning one of my CPUs at 100%, consuming so much memory it has caused my system to nearly exhaust its swap file and nearly consume all of the snapshot volumes Copy-On-Write table!
===== Lowering IO and CPU scheduling priority of fsck =====
# meermanr@Ikari:/home/meermanr/projects/find_ext3 (master)
# ps fo pid,pmem,pcpu,cmd -t 13
PID %MEM %CPU CMD
19540 0.0 0.0 -bash
19666 0.0 0.0 \_ sudo su
19667 0.0 0.0 \_ su
19675 0.0 0.0 \_ bash
20181 0.0 0.0 \_ fsck /dev/loop0 -y
20182 49.4 86.8 \_ fsck.ext2 -y /dev/loop0
To keep my system usable I lowered the IO and CPU priority of ''fsck''. First change the IO scheduling class to "idle" (3) for the hungry process:
root@Ikari:~# ionice -c3 -p 20182
Then raise the "niceness" of the process. Higher values make processes nicer, which means they are more likely to "give way" to other processes. Really it just means the kernel will pre-empty nice processes more often:
root@Ikari:~# renice 10 20182
20182 (process ID) old priority 0, new priority 10
===== Adding more swap to my system =====
# meermanr@Ikari:/home/meermanr/projects/find_ext3 (master)
# free -m
total used free shared buffers cached
Mem: 5969 5523 445 0 325 117
-/+ buffers/cache: 5080 888
Swap: 6234 3477 2757
I was concerned to see that 50% of my swap was in use. I don't know how long ''fsck'' will take, so adding more swap seems prudent. As luck would have it, I recently added an solid state drive (SSD) to my system, so I have an unused spinning disk which I'm pretty sure has a swap partition on it that isn't doing anything.
# meermanr@Ikari:/home/meermanr/projects/find_ext3 (master)
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 894.3G 0 loop
sda 8:0 0 93.2G 0 disk
├─sda1 8:1 0 87.2G 0 part /
├─sda2 8:2 0 1K 0 part
└─sda5 8:5 0 6G 0 part [SWAP]
sdb 8:16 0 298.1G 0 disk
├─sdb1 8:17 0 294.1G 0 part
├─sdb2 8:18 0 1K 0 part
└─sdb5 8:21 0 4G 0 part
sr0 11:0 1 1024M 0 rom
sr1 11:1 1 1024M 0 rom
sde 8:64 0 1.8T 0 disk
└─sde1 8:65 0 1.8T 0 part
├─vg_scratch-home_zfs (dm-0) 252:0 0 186.3G 0 lvm
├─vg_scratch-lv_scratch-real (dm-3) 252:3 0 894.3G 0 lvm
│ ├─vg_scratch-lv_scratch (dm-2) 252:2 0 894.3G 0 lvm
│ └─vg_scratch-snap (dm-1) 252:1 0 894.3G 0 lvm
├─vg_scratch-snap-cow (dm-4) 252:4 0 18.6G 0 lvm
│ └─vg_scratch-snap (dm-1) 252:1 0 894.3G 0 lvm
└─vg_scratch-photorec (dm-5) 252:5 0 763.9G 0 lvm
It's probably ''/dev/sdb5'', let's check with ''file'':
# meermanr@Ikari:/home/meermanr/projects/find_ext3 (master)
# file -Ls /dev/sdb5
/dev/sdb5: no read permission
# meermanr@Ikari:/home/meermanr/projects/find_ext3 (master)
# sudo !!
sudo file -Ls /dev/sdb5
[sudo] password for meermanr:
/dev/sdb5: Linux/i386 swap file (new style), version 1 (4K pages), size 1048063 pages, no label, UUID=d0bbff73-a09a-47f6-8387-e27268cdc9fc
Great! Let's enable it!
# meermanr@Ikari:/home/meermanr/projects/find_ext3 (master)
# sudo swapon /dev/sdb5
And verify:
# meermanr@Ikari:/home/meermanr/projects/find_ext3 (master)
# lsblk | grep SWAP
└─sda5 8:5 0 6G 0 part [SWAP]
└─sdb5 8:21 0 4G 0 part [SWAP]
# meermanr@Ikari:/home/meermanr/projects/find_ext3 (master)
# free -m
total used free shared buffers cached
Mem: 5969 5672 297 0 435 138
-/+ buffers/cache: 5098 871
Swap: 10234 3476 6758
===== Extending the snapshot volume =====
# lvdisplay /dev/vg_scratch/snap
--- Logical volume ---
LV Name /dev/vg_scratch/snap
VG Name vg_scratch
LV UUID 4EFJ8Y-bzWT-aif4-MlT9-4234-aS1d-qcipq0
LV Write Access read/write
LV snapshot status active destination for /dev/vg_scratch/lv_scratch
LV Status available
# open 1
LV Size 894.27 GiB
Current LE 228934
COW-table size 9.63 GiB
COW-table LE 2335
Allocated to snapshot 74.40% <-- Do not want!
Snapshot chunk size 4.00 KiB
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:1
As it happens, I had not allocated all of the volume group:
root@Ikari:~# vgs
VG #PV #LV #SN Attr VSize VFree
vg_scratch 1 4 1 wz--n- 1.73t 9.3g
So extending the snapshot is easy:
root@Ikari:~# lvextend /dev/vg_scratch/snap --extents +100%FREE
Verify:
# lvdisplay /dev/vg_scratch/snap
--- Logical volume ---
LV Name /dev/vg_scratch/snap
VG Name vg_scratch
LV UUID 4EFJ8Y-bzWT-aif4-MlT9-4234-aS1d-qcipq0
LV Write Access read/write
LV snapshot status active destination for /dev/vg_scratch/lv_scratch
LV Status available
# open 1
LV Size 894.27 GiB
Current LE 228934
COW-table size 18.63 GiB <-- Has increased
COW-table LE 4769 <-- (Same thing, but measured in logical extents)
Allocated to snapshot 39.18% <-- Much better!
Snapshot chunk size 4.00 KiB
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:1
===== ... 2 weeks later =====
It has been two weeks since I started ''fsck'', and it is still running. During this time I've not been able to use my desktop PC for gaming, and so I've decided to hit ^C and move it to another machine. Here are the forensics:
Now:
# meermanr@Ikari:/home/meermanr (master *)
# date
Mon May 27 14:53:34 BST 2013
Size of block device:
root@Ikari:/home/meermanr/projects/find_ext3# python
Python 2.7.3 (default, Aug 1 2012, 05:14:39)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open("/dev/loop0")
>>> f.seek(0, 2)
>>> f.tell()
960218560000
>>> hex(f.tell())
'0xdf917c7600'
>>>
Offsets of ''fsck'' and ''python'':
Every 2.0s: lsof /dev/loop0 Mon May 27 14:56:18 2013
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
fsck.ext2 20182 root 4u BLK 7,0 0x3f903ef000 5941 /dev/loop0
python 23598 root 3r BLK 7,0 0xdf917c7600 5941 /dev/loop0
That's approximately 28%. :-(
STARTED %CPU %MEM RSS CMD
May 13 0.0 0.0 364 su
May 13 0.0 0.0 528 \_ bash
May 13 0.0 0.0 1320 \_ watch lvdisplay /dev/vg_scratch/snap
May 13 0.0 0.0 364 su
May 13 0.0 0.0 536 \_ bash
May 13 0.0 0.0 464 \_ fsck /dev/loop0 -y
May 13 99.1 24.3 1488468 \_ fsck.ext2 -y /dev/loop0
May 14 0.0 0.0 1456 watch lsof /dev/loop0
So ''fsck'' has about 1.4GiB of resident memory (and 4696MiB of virtual memory according to top):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20182 root 30 10 4696m 1.4g 784 R 94 24.3 19564:51 fsck.ext2 -y /dev/loop0
Total system memory:
# meermanr@Ikari:/home/meermanr (master *)
# free -m
total used free shared buffers cached
Mem: 5969 5036 933 0 87 157
-/+ buffers/cache: 4791 1178
Swap: 10234 4764 5470
Every 2.0s: lvdisplay /dev/vg_scratch/snap Mon May 27 14:54:06 2013
File descriptor 4 (pipe:[72594953]) leaked on lvdisplay invocation. Parent PID 23517: sh
--- Logical volume ---
LV Name /dev/vg_scratch/snap
VG Name vg_scratch
LV UUID 4EFJ8Y-bzWT-aif4-MlT9-4234-aS1d-qcipq0
LV Write Access read/write
LV snapshot status active destination for /dev/vg_scratch/lv_scratch
LV Status available
# open 1
LV Size 894.27 GiB
Current LE 228934
COW-table size 84.89 GiB
COW-table LE 21733
Allocated to snapshot 43.48%
Snapshot chunk size 4.00 KiB
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:1
Output from ''fsck'' itself:
File ... (inode #9791282, mod time Thu Oct 5 01:40:26 2006)
has 11143 multiply-claimed block(s), shared with 5 file(s):
... (inode #9791794, mod time Thu Oct 5 01:40:26 2006)
... (inode #4115835, mod time Thu Aug 20 03:31:06 2009)
... (inode #4130006, mod time Mon Nov 29 16:38:10 2010)
... (inode #4784754, mod time Tue Jul 26 06:01:10 2005)
Clone multiply-claimed blocks? yes