8-ID
  8-ID Beamline logbook mirror  Not logged in ELOG logo
Message ID: 855     Entry time: Thu Jul 28 18:20:28 2022
Author: Eric Dufresne 
Type: 8-ID General 
Category: Computers and Network 
Subject: Beamline distributed server and storage outage resolved 7/30/2022 at 11h23 after a ~44 hrs outage 
On 7/28/2022 around 15h30, our Linux server froze up.
All services on beamline Linux machines were affected so only our office PCs
and laptop could access email and teleconferencing software.
For example on pepper, you could not click on any native applications resident on the computer
and you couldn't open any centralized software like caQtDM, etc...
All centralized software were down. The problem was restored today at 11h23, see below
for select emails. More details on my personnal logs https://logbook.xray.aps.anl.gov/sector7/research/2572

The beamline 8-ID-I was restored by Suresh first with good flux.
8ID-E had a small matlab problem, so it was restarted, but the flux was good too.

Per Suresh, only about 7 beamlines were operational during this outage, out of 57 beamlines.

ED

UPDATE: Beamline distributed server and storage outage
GeneralAll <[email protected]>
on behalf of
APS General Announcements via GeneralAll <[email protected]>
Sat 7/30/2022 11:23 AM
To:APS General Announcements <[email protected]>
UPDATE to beamline distributed server and storage outage:

The IT team actively working this issue with HPE engineering support were able to determine that
the initiating event which occurred on July 28, change the zoning configuration on the external fiber channel
switches, resulting in all ports being disconnected.

The zoning configuration issue was recently cleared and connections from servers to storage re-established.

As of this writing, all XSD and CAT virtual servers have been restored to full availability.

The IT team continues to work with the HPE support team that was assembled for this outage to
improve system configuration and prevent recurrence.

Thank you for your patience while this significant issue was addressed, and to the IT team for their tireless
efforts over the last 40+ hours in bringing this to successful resolution, and future steps to be taken to prevent recurrence.

John Connolly

Deputy Associate Laboratory Director for Operations


[xray-beamline-notify] Problem with dserv VM storage
xray-notify <[email protected]>
on behalf of
APS Beamline Notification via xray-notify <[email protected]>
Thu 7/28/2022 3:45 PM

To:[email protected] <[email protected]>

Beamline users,

A hardware failure has caused the dservs to go in to a "paused" state. APS IT staff are working to resolve the issue.

This has caused some beamline services to go offline and make some filesystems inaccessible.

Roger Sersted

Argonne National Laboratory
Advanced Photon Source
AES-IT Group
(630) 252-9929
ELOG V3.1.4-395e101