// DBOINCP-300: added node comment count condition in order to get Preview working ?>
Anonymous
23 Dec 2008 18:42:44 UTC
Topic 13732
(moderation:
)
We had some problems on ATLAS since the failure of the cooling unit on Sunday. Currently Einstein@home is not running on ATLAS and will be brought back up slowly during the next days.
I think I can remember the story about dying servers, and I think it affected the older Merlin cluster (> 5 years old), which is made from Dual Athlon MP systems (K7 architecture) in desktop cases. The newer Morgane Cluster is made from K8 Opterons in 19" cases IIRC and should not yet show that many signs of aging.
CU
Bikeman
After the last power outage @AEI Potsdam a few weeks ago Merlin was no longer reactivated. It is actually dead now. I think it had less than 50 nodes left running of its original 180.
Morgane is running well, about half a dozen (of 615) nodes are down for hardware failures, that's all.
[edit]Looks like most of the failed nodes have been repaired, only one seems to be down. Learn more about the AEI clusters at gw.aei.mpg.de.[/edit]
For the curious: Einstein@home has been stopped on ATLAS again, apparently there is a bug in the current version of Condor (the job scheduler on LSC clusters) that prevents ordinary jobs from being started on a node while Einstein@home is running there.
BTW: at Ganglia you can get an overview of the status of the LSC clusters (including Nemo, Morgane and ATLAS). "nice" usually means Einsten@home.
ATLAS issues
)
After the last power outage @AEI Potsdam a few weeks ago Merlin was no longer reactivated. It is actually dead now. I think it had less than 50 nodes left running of its original 180.
Morgane is running well, about half a dozen (of 615) nodes are down for hardware failures, that's all.
[edit]Looks like most of the failed nodes have been repaired, only one seems to be down. Learn more about the AEI clusters at gw.aei.mpg.de.[/edit]
BM
BM
For the curious:
)
For the curious: Einstein@home has been stopped on ATLAS again, apparently there is a bug in the current version of Condor (the job scheduler on LSC clusters) that prevents ordinary jobs from being started on a node while Einstein@home is running there.
BTW: at Ganglia you can get an overview of the status of the LSC clusters (including Nemo, Morgane and ATLAS). "nice" usually means Einsten@home.
BM
BM