system news

HPC2N is currently suffering from a University wide power failure, *FINISHED 11:50*

  • Posted on: 27 May 2020
  • By: ake

We lost power around 09:40 today due to a University wide power failure.

This means that login is not possible at the moment.

We will update this with new info when we get any updates.

*UPDATE 20200527 11:25*

Power is now back and we are powering up the systems

*UPDATE 20200527 11:50*

Login nodes are now open again, cluster batch nodes are coming up soon and jobs will start running before 13:00

Maintenance on cooling system affecting both clusters 2020-05-11 *FINISHED*

  • Posted on: 4 May 2020
  • By: ake

Monday 2020-05-11 we will have a maintenance on the cooling system.

This affects both clusters causing them to be offline.

The maintenance window is between 06:00 and 17:00 on Monday 2020-05-11

During the time leasding up to the downtime only shorter (and shorter) jobs can run.

 

*UPDATE 2020-05-11 17:35*

An unexpected problem is causing a delay in getting the clusters up and running again.

We're working as fast as we can to fix the problem.

 

Power maintenance affects Kebnekaise and Abisko, Monday 2020-04-20 and Tuesday 2020-04-21 *Clusters and login nodes now up again*

  • Posted on: 14 April 2020
  • By: bbrydsoe

Due to power maintenance in the MIT building, there will be two outages that affects HPC2N's systems, one on 20 April 2020 and one on 21 April 2020.

For this reason, the batch systems on both Abisko and Kebnekaise will be down and no jobs will run.

Downtimes for the clusters

Monday, 2020-04-20, 6:30-12:00

Tuesday, 2020-04-21, 11:30-17:00

In the days before the maintenance, only jobs that have a runtime which is short enough to finish before the start of the maintenance will run.

Maintenance on cooling system affects Kebnekaise and Abisko, 2019-10-16 - 17

  • Posted on: 27 September 2019
  • By: ake

There will be a two day maintenance on the cooling system and power feed 2019-10-16 - 17 that affects both Kebnekaise and Abisko.

The maintenance window starts at 2019-10-16 04:00 and ends 2019-10-17 18:00

The clusters will be down during that period and no jobs will be running.

During the days leading up to the maintenance only jobs with a short enough runtime to finish before the maintenance starts will be allowed to run.

So, if you have jobs that can use a shorter runtime it is advisable to submit them during the week(s) before the maintenance window.

AFS home directories inaccessible on Kebnekaise (resolved)

  • Posted on: 5 August 2019
  • By: zao

The home directories on Kebnekaise are inaccessible since this weekend. We're aware of the problem and are looking into it the cause.
For urgent accesses, please use the Abisko login node in the meantime. The AFS and PFS file systems are accessible there even if you only have an allocation on Kebnekaise.

[Update 2019-08-05 09:45 CEST]
We've rebooted the login node to restart the affected services and access to AFS home directories is now restored.

Upgrade of Lustre servers to solve the last weeks problems 2019-07-(01-05) (clusters now UP again)

  • Posted on: 27 June 2019
  • By: torkel

The last two weeks we have had serious problems with PFS, the parallel file system. The cause to the problems was identified fairly quickly. All attempts to get a temporary fix in place over the summer have failed though.

We have therefore, in consultation with the vendor of the storage solution, decided to update the server software starting the morning of July 1. The update was originally planned to take place in the early autumn and contains a permanent fix to the problems we have seen.

The update is expected to take the whole week.

Continued severe problems with pfs (2019-06-25)

  • Posted on: 25 June 2019
  • By: bbrydsoe

The pfs (parallel file system) is still experiencing severe problems. The bug fix we implemented seemed to stabilize it for about a week, but now pfs is again down.

Both Kebnekaise and Abisko are affected, including access to the PFS filesystem from the login nodes. We recommend that you try to avoid using the PFS filesystem, since it either takes very long to access or it cannot be accessed at all.

We are currently working intensively to solve the problems. At the moment we have no ETA when the problems will be resolved.

Pages

Updated: 2024-11-01, 13:56