system news

Abisko down due to network problems

  • Posted on: 29 April 2016
  • By: admin

Abisko is down due to network problems.

We are investigating the problem.

All queues have been stopped.

*UPDATE 2016-03-09 19:18*
The main ethernet switch for Abisko has died.
We will chase our supplier to fix this asap.

It will not happen until earliest tomorrow (2016-03-10)

*UPDATE 2016-03-09 20:00*
We found a spare power module for the switch and Abiskos ethernet network is now back onlnie.
We will verify system functionality (see other system news about /pfs/nobackup) before bringing the batch queues back on.

/pfs/nobackup problems

  • Posted on: 29 April 2016
  • By: admin

We are currently experience some problems with the /pfs/nobackup filesystem. Investigations are ongoing. 

All the batch queues have been stopped until further notice.

UPDATE 13:04: Everything should be back to normal state again. It seems that it was only a local problem on the Abisko login node. Jobs running through the batchsystem should not have been affected.

/pfs/nobackup unavailable

  • Posted on: 29 April 2016
  • By: admin

We are experiencing some problems with the /pfs/nobackup filesystem. Investigations are ongoing. At the moment we have no ETA when /pfs/nobackup will up agian.

All the batch queues have been stopped until further notice.

*UPDATE 2015-12-23 17:10*
The file system is now back online and batch queues have been enabled again.

/pfs/nobackup currently not responding again, ALL batch queues stopped (FIXED AGAIN)

  • Posted on: 28 April 2016
  • By: admin

The /pfs/nobackup filesystem stopped responding again.

We're working on getting it back online.

*UPDATE 2015-10-16 10:05*
The file system is now back online again and the queues are running.

*UPDATE 2015-10-16 12:35*
The file system is non-responsive again, we're trying to get it back online as soon as possible

*UPDATE 2015-10-16 20:00*
The file system is now back online and the problem has been identified.

Fri, 2015-10-16 08:28 | Åke Sandgren

Power maintenance 2015-09-18 06:30, batch nodes will be drained of running jobs.

  • Posted on: 28 April 2016
  • By: admin

The power company is doing a maintenance on the high voltage feed to the University on Friday September 18th, 06:30.

This will result in total power loss to the whole University, thus we need to drain the batch nodes.

During the days leading up to the maintenance window, it is advisable to submit shorter jobs, that can finish in the remaining time until the window starts. To allow a little bit of margin the system will not allow jobs to run after 06:20 on the 18th.

Batch queues stopped due to /pfs/nobackup being out of inodes (files). (Partially fixed)

  • Posted on: 28 April 2016
  • By: admin

We have unfortunately been forced to stop all batch queues on the clusters.

/pfs/nobackup has run out of inodes. Something probably created more inodes (files) then intended.

We are working on finding out where and getting the usage down to normal levels.

Until this is fixed we need to keep the batch queues stopped to avoid risking jobs to fail due to not being able to create new files.

pfs problems (solved)

  • Posted on: 28 April 2016
  • By: admin

The /pfs/nobackup (lustre) file system is currently unavailable due to after effects of an electric maintenance problem (see here).

We apologize for any inconvenience this may cause.

We are currently working on restoring access to pfs, but we do not have an ETA right now.

This news will be updated with more information when we have it.

*UPDATE 20150820 15:45*
It will, most likely, take at least until Friday 21th, before we can get this resolved.

Pages

Updated: 2024-11-01, 13:56