CELS Virtual Helpdesk
CELS Shared Services Systems Group

CELS Virtual Helpdesk
CELS Shared Services Systems Group
Power work in the data center has taken a handful of compute nodes offline for a few hours. They should be back online early this afternoon. The affected machines are: octagon.mcs.anl.gov cookie.mcs.anl.gov petsc.mcs.anl.gov cg.mcs.anl.gov gnep.mcs.anl.gov octopus.mcs.anl.gov Sorry for the inconvenience. We didn’t believe these machines would be affected by the work, however we were incorrect. …
The disk migration is finally finished. User home directories are now on their own partition, and the full disk problem has been rectified. There’s currently over 50GB available in user home directories on RDP for any files and programs that need local storage. Thanks for your patience!
rdp.mcs.anl.gov is offline until further notice. The outage window is through 5PM, but I don’t expect it to take that long. I’ll post here when the work is done.
Unfortunately, the home directory migration was not yet successful, so we’re in the same boat we were in before the outage with space being very tight. I’m going to take another crack at it on Sunday, which means from around noon to 5PM you can expect the machine to be unavailable. If anything changes, I’ll …
Those of you who use rdp.mcs.anl.gov (Remote Desktop server for Windows) may have noticed the disk is quite full. I need to migrate users to a new partition to free up space. This, however, requires the machine be offline during the migration. At the moment, the plan is to take the machine offline tomorrow at …
Quick summary: I just got back from the 221 data center (gee, it’s hot outside) having replaced what we suspect are bad power supplies in a Virtual Machine Host server. We isolated the issue to this specific server rebooting without offering any useful information as to why in its logs, coupled with a bad set …
A similar outage to this morning is occurring (though limited in scope at the moment since we know what *won’t* work to bring things back. Stand by…
Addendum. One of the web servers (personal pages and project pages under www.mcs.anl.gov) is still booting up. It has been awhile since it rebooted and it’s doing a filesystem check. It should be back within the next hour.
The aforementioned outage is resolved. A disk problem took out our virtual server master around 4 AM, which took out a handful of virtual servers. In trying to resolve that, at around 8 AM we ended up taking out the rest of the virtual machines hosted in that cluster due to dependencies. We’ve fixed the …