system news

Problem with /pfs/nobackup filesystem

  • Posted on: 28 April 2016
  • By: admin

Due to problems with the /pfs/nobackup filesystem both Akka and Abisko batch systems have been suspended.

One of the file servers has a broken moderboard and needs to be replaced.
ETA of a new moderboard is 2012-04-19.

We will investigate the other file servers for potential HW problems before taking the filesystem online again.

If everything else is ok we will run the filesystem with reduced capacity in the meantime.

Sat, 2012-04-14 16:26 | Åke Sandgren

Abisko - down for transistion to the final system

  • Posted on: 28 April 2016
  • By: admin

The time to transition Abisko from the interim to the final system is here. During the
transition, Abisko will be unavailable starting from Wednesday February 15th 08:00. Assuming the transition goes smoothly, we expect to have the final system functional early next week.

During the migration from the interim system, we need to make a few changes that are disruptive to the cluster interconnect. The batch nodes will therefore not process any jobs from the above point in time until the transition is complete.

Abisko - final system online

  • Posted on: 28 April 2016
  • By: admin

The final system of Abisko is now available and the queues have been restored. 

We still have some things to fix and tune so there may be some interruption the next couple of weeks.

Not all 318 nodes will be available at the moment. The interim system has not been upgraded to the new CPU:s and some other nodes are used for running some tests.

Some software may still be missing, but we are working on building them.

More information about Abisko can be found here:

GPFS problems (updates here when available)

  • Posted on: 27 April 2016
  • By: admin

We are currently having  some problems with GPFS, the parallel filesystem, which we are investigating.

The queues on all clusters have been stopped.

Update 12:46
The problematic disk has been suspended and new data will not be written to it any longer.
Old, not broken, data can still be read.
Data is being migrated away from the problematic disk.
We currently do not have an estimate of how long this will take.

GPFS problem now solved

  • Posted on: 27 April 2016
  • By: admin

The problem with the GPFS parallel filesystem (/pfs/nobackup) has now been solved.
We did not see any indications of lost or damages files during the recovery process.
However you are advised to check your files to be on the safe side.

Batch queues have been reactivated.

Fri, 2011-12-02 14:58 | Åke Sandgren

Pages

Updated: 2024-11-01, 13:56