Around 18:08 UTC (11:08:09 PDT), our main MySQL database server went down. Luckily, yesterday (Thurdsay), I had just replaced our whole DB infrastructure to have a replicated master/slave setup. It took us 15 minutes to notice that the sites were down, and another 20 minutes to execute a database failover. By 18:50 UTC (11:50 PDT), things were back to normal.
Next steps? We're running off an small instance instead of our new m1.large. So I'll need to copy a few gigabytes of data around. Once that's done, we still need a post-mortem.
What happened? At the moment, the only thing I can say is that the LVM partition that held the data was gone. And when I say gone, I really mean gone:
# pvs
PV VG Fmt Attr PSize PFree
/dev/sdb mysql lvm2 a- 419.96G 369.96G
# vgs
VG #PV #LV #SN Attr VSize VFree
mysql 1 1 0 wz--n- 419.96G 369.96G
# lvs
LV VG Attr LSize Origin Snap% Move Log Copy%
data mysql -wi--- 50.00G
# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw,noexec,nosuid,nodev)
/sys on /sys type sysfs (rw,noexec,nosuid,nodev)
varrun on /var/run type tmpfs (rw,noexec,nosuid,nodev,mode=0755)
varlock on /var/lock type tmpfs (rw,noexec,nosuid,nodev,mode=1777)
udev on /dev type tmpfs (rw,mode=0755)
devshm on /dev/shm type tmpfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
# ls /dev/mapper/
control
As you can see, the logical volume is still present and known to LVM, but it isn't mountable. Really strange.
We are investigating the cause of the crash and will post more information here as appropriate.



Comments(0)