RSS Musings and Random Thoughts

print E-mail RSS Bookmark us Share

Feed Training

Login to access more blogs

Catastrophic Database Crash on the main XLsuite master MySQL instance

Posted by Beausoleil Francois on 08-22-2008        Rating: rating rating rating rating rating

Tags: crash , database , recovery

Around 18:08 UTC (11:08:09 PDT), our main MySQL database server went down. Luckily, yesterday (Thurdsay), I had just replaced our whole DB infrastructure to have a replicated master/slave setup. It took us 15 minutes to notice that the sites were down, and another 20 minutes to execute a database failover. By 18:50 UTC (11:50 PDT), things were back to normal.

Next steps? We're running off an small instance instead of our new m1.large. So I'll need to copy a few gigabytes of data around. Once that's done, we still need a post-mortem.

What happened? At the moment, the only thing I can say is that the LVM partition that held the data was gone. And when I say gone, I really mean gone:

# pvs
    PV         VG    Fmt  Attr PSize   PFree  
    /dev/sdb   mysql lvm2 a-   419.96G 369.96G

# vgs
  VG    #PV #LV #SN Attr   VSize   VFree  
  mysql   1   1   0 wz--n- 419.96G 369.96G

# lvs
  LV   VG    Attr   LSize  Origin Snap%  Move Log Copy% 
  data mysql -wi--- 50.00G                              

# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw,noexec,nosuid,nodev)
/sys on /sys type sysfs (rw,noexec,nosuid,nodev)
varrun on /var/run type tmpfs (rw,noexec,nosuid,nodev,mode=0755)
varlock on /var/lock type tmpfs (rw,noexec,nosuid,nodev,mode=1777)
udev on /dev type tmpfs (rw,mode=0755)
devshm on /dev/shm type tmpfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)

# ls /dev/mapper/
control

As you can see, the logical volume is still present and known to LVM, but it isn't mountable. Really strange.

We are investigating the cause of the crash and will post more information here as appropriate.

Comments(0)

Make a comment