system news

Maintenance affecting all HPC2N systems between 2018-06-04 and 2018-06-08

  • Posted on: 22 May 2018
  • By: ake

We have a major maintenance to upgrade the Lustre parallel file system (/pfs/nobackup).

One of the goals is to introduce new functionality, for instance project based storage.

We will also do a number of related changes and optimizations to the Lustre file system setup.

 

We are planning for a full week of complete downtime on all HPC2N systems, including login and thinlinc nodes, starting 2018-06-04 07:30 CEST.

Please make sure to copy any files you want to work with during the downtime to some other system well in advance of the maintenance.

Maintenance for Slurm (batch system) upgrade on Abisko and Kebnekaise: 2018-03-16

  • Posted on: 7 March 2018
  • By: ake

The maintenance window has been moved to March 16th.

On March 16 we will have a maintenance window for upgrading Slurm (the batch system) on both Abisko and Kebnekaise.

We have therefor put a reservation in place on both clusters starting 2018-03-16 08:00.

Jobs will not be allowed to start if their requested runtime reaches beyond that point in time.

So leading up to this maintenance window it is advantageous to submit jobs with smaller runtimes.

 

We expect to be finished no later than 2018-03-17 17:00.

File system problem on /pfs/nobackup, queues stopped: 2018-01-24 19:22; *SOLVED*

  • Posted on: 24 January 2018
  • By: ake

We are currently experiencing problems with the /pfs/nobackup file system.

One of the storage units are misbehaving causing problems when accessing files located on it.

Due to this the batch queues on both Abisko and Kebnekasie have been stopped.

Login may hang for users whos login script (.bashrc) tries to access the /pfs/nobackup filesystem

We currently have no estimate for when the problem will be solved.

 

* UPDATE: 2018-01-24 20:00 *

The problem has now been solved.

Kernel upgrades may cause slight disturbances - 2018-01-10

  • Posted on: 10 January 2018
  • By: ake

During the coming days we will be upgrading the kernel on all systems due to the Meltdown/Spectre security bugs.

This may cause some interference in the normal behaviour of the systems like slow or temporarily missing file system access, stopped queues and similar.

We will try to minimize the user visible effects but this is a upgrade we must do as quickly as possible and may sometimes have to do things in such a way that the user experience is somewhat degraded.

Cooling system maintenance, all clusters down 2017-12-19, *CANCELED*

  • Posted on: 14 December 2017
  • By: ake

Tuesday 2017-12-19 there will be a maintenance on the cooling system for the room with our cluster storage (/pfs/nobackup).

This means that we have to take that storage down and also the clusters and login nodes.

There is currently a reservation on all nodes of both clusters starting 2017-12-19 07:00 CET.

Any jobs that have a walltime requirement that would extend into that maintenance window will not be allowed to start.

Pages

Updated: 2024-11-01, 13:56