system news

Batch system OK

  • Posted on: 28 April 2016
  • By: admin

We had some raid problems on our batch server, meaning (among other things) that no jobs could be submitted. 

It was necessary to reboot to server. Everything seems to be working correctly again.

Fri, 2014-11-14 12:03 | Birgitte Brydsö

The new center storage system is now available

  • Posted on: 28 April 2016
  • By: admin

The new centre storage is now in production.

The /pfs/nobackup file system is now larger and faster, ... finally.

Almost all users have been synchronized to new new file system.
The few remaining users (those affected will get a separate mail) have been blocked from logging in and their jobs put on hold until the transfer is complete for each user.

Jobs are running again and login has been opened (see exception above).

If you notice anything strange please notify support@hpc2n.umu.se.

New center storage system

  • Posted on: 28 April 2016
  • By: admin
Dear Users,
 
We apologize for the late notification. However we have some really good news.
 
HPC2N has during the summer and early autumn procured, tested and deployed a new center storage system which will replace the old, aging GPFS based system. The new system is a DDN SFA 12KX and Exascaler solution using Lustre as the underlying filesystem. The new storage system consists of 1PB storage and will be up to 25 times faster, depending on I/O pattern, than the old one. 
 

Problems with batchjobs on Abisko and Akka clusters.

  • Posted on: 28 April 2016
  • By: admin

Due to a recent security update of the 'bash' shell there is a high probability that jobs that were submitted before the nodes received the update may fail to use the 'module' functionality once they start running.

As the login nodes are now updated, any future job submissions should work as intended as soon as you log out and in again of any long-running sessions, as you need to be on the new version of the shell.

We are very sorry about this unforeseen problem.

Downtime on Abisko for large scale testing of our new center storage

  • Posted on: 28 April 2016
  • By: admin

Thursday Aug 28th we will performe some large scale tests of our new center storage.

To do this we have reserved ALL nodes of Abisko starting 09:30 CEST
The tests are expected to take a couple of hours after which the system will be back for normal use.

We recommend submitting jobs with shorter runtimes which will then fit nicely into the slots that become available due to draining the system.

UPDATE 2014-08-28 20:30 CEST

Power outage - Abisko and Akka affected

  • Posted on: 28 April 2016
  • By: admin

We just had a power outage this morning. All jobs running on both Akka and Abisko were interrupted. We are currently in the process of restoring the systems.

We hope to have all Akka compute nodes back in service by the end of today. 

Abisko nodes may take longer. It is possible that we can get them back up today, but they may not be available until Thursday morning, due to the scheduled file system update tomorrow.

Tue, 2014-08-19 11:40 | Daniel Petersen

Pages

Updated: 2024-11-01, 13:56