Critical NFS Bug: Potential Data Loss
There is a critical bug on our AIX NFS servers (gorilla, kong and magilla) serving MCS home directories and MCS project data. This bug is encountered only on our newest linux desktops, a list of which appears at the bottom of this note. Specifical...
There is a critical bug on our AIX NFS servers (gorilla, kong and magilla) serving MCS home directories and MCS project data. This bug is encountered only on our newest linux desktops, a list of which appears at the bottom of this note.Specifically, the bug is encountered when group writable files are accessed on one of these machines by someone other than the owner. In the event this occurs, the user will get an error, but more importantly the file will be zeroed out (i.e. replaced with an empty file).If you have used one of the machines listed you will want to double check your files to ensure you haven’t accidentally lost data due to this bug. If you have, let us know so we can restore as soon as possible. In the meantime, make sure any network file writes are done from a known safe machine, such as terra, shakey, harley, triumph, elephant, crunch, smash or schwinn. If your desktop machine is not listed below, it is also safe. You can type “whatami” from a linux terminal as well — if the output is linux-debian_3.1-ia32, that machine is safe.IBM has issued a fix for this bug, however in order to be able to apply this to our servers, we would need to upgrade the full operating system on them. The amount of downtime associated with this would be unacceptable. Instead, we will be taking a fast track to get one of the new Solaris file servers online and migrate all NFS shares over to that. We have confirmed this bug is not present in Solaris.This is our top priority task. I can’t give an exact time frame at this point, but I will promise a status update tomorrow (6/27). This is a fast-track emergency solution — we’ll deploy a more elegant solution once this fire’s out.My sincere apologies for this. If you’ve lost data because of this, we’ll do all we can to get it back.The list of affected machines:bucco.mcs.anl.govcifaretto.mcs.anl.govcontra.mcs.anl.govcsi334378.mcs.anl.govdarth.mcs.anl.goveffable.mcs.anl.govgarth.mcs.anl.govgnep.mcs.anl.govhaines-desktop.mcs.anl.govhookshot.mcs.anl.govhorikawa-cph.mcs.anl.govjoy-3-06p4-cph.mcs.anl.govkant.mcs.anl.govkirby.mcs.anl.govkschoche-desktop.mcs.anl.govlikli-desktop.mcs.anl.govlucky[0-6].mcs.anl.govluigi.mcs.anl.govlust-cph.mcs.anl.govlust-cph.mcs.anl.govmsulliva-desktop.mcs.anl.govnehebkau.mcs.anl.govnoah.mcs.anl.govoctagon.mcs.anl.govoctopus.mcs.anl.govopteron-ibm.mcs.anl.govpaulie.mcs.anl.govpiano.mcs.anl.govredline.mcs.anl.govroberts-desktop.mcs.anl.govrouxamd64.mcs.anl.govseed-linux-1.mcs.anl.govsmithy.mcs.anl.govsson-desktop.mcs.anl.govstrat.mcs.anl.govstrength.mcs.anl.govwayne.mcs.anl.gov