8-ID
  8-ID Beamline logbook mirror  Not logged in ELOG logo
Message ID: 817     Entry time: Thu Nov 4 11:07:07 2021
Author: Eric Dufresne 
Type: 8-ID-E 
Category: Detectors 
Subject: Detector problem 8-ID-E Lambda250k, 11/3/2021 with Oleg's beamtime 

Useful debugging commands


  • xspadmin@xspserver:~/xenv$ ./bin/run dcinfo
  • As for the gui, I installed locally and was using 'nexpy' to view the .nxs images stored by
    the 'run acquire' xenv script. That should work over NX or ssh equally well, let me know if not.

Re: Lambda 250K
Julian Schmehr <[email protected]>
Thu 11/4/2021 3:21 AM
To: Narayanan, Suresh <[email protected]>; Andreas Beckmann <[email protected]>; Guruswamy, Tejas <[email protected]>;
Dufresne, Eric <[email protected]>; Miceli, Antonino <[email protected]>; Lang, Keenan C. <[email protected]>

Hi Suresh,

So it sounds like you are still getting data which is a good sign. One thing you could check is whether
the high voltage is working normally. This is returned as an output when running dcinfo:

cd /home/xspadmin/xenv
run dcinfo

If the high voltage isn’t on this would cause images with only some noisy pixels. Could you send us the
output of dcinfo please? Also if you can send an example datafile that will help with identifying the problem.

Sincerely,

Julian





Re: Lambda 250K
Narayanan, Suresh <[email protected]>
Thu 11/4/2021 6:07 AM
To: Julian Schmehr <[email protected]>; Andreas Beckmann <[email protected]>; Guruswamy, Tejas <[email protected]>;
Dufresne, Eric <[email protected]>; Miceli, Antonino <[email protected]>; Lang, Keenan C. <[email protected]>

Hi Julian,

Thanks for your response.

Tejas checked HV and it returns 200.00xx, dc run takes data with no errors.
In the entire frame, we see 3 hot pixels, 2 together one place and one in another place.
Tejas checked that calib files are in the right place.
The users were running normally when IOC crashed maybe around that time and then since then this has been like that.
Tejas also installed your xsp-py lib and took images with it, shows the same 3 pixels.

Suresh

Re: Lambda 250K
Narayanan, Suresh <[email protected]>
Thu 11/4/2021 6:09 AM
To: Julian Schmehr <[email protected]>; Andreas Beckmann <[email protected]>;
Guruswamy, Tejas <[email protected]>; Dufresne, Eric <[email protected]>; Miceli, Antonino <[email protected]>; Lang, Keenan C. <[email protected]>

Hi Julian,

xspadmin@xspserver:~/xenv$ ./bin/run dcinfo

run: running make in tasks/basic
make: Nothing to be done for 'all'.
run: v1.1.0
run: logging output to logs/unknown/2021-11-04T06:08:37/run.log 
run: running tasks/basic/dcinfo
                config file: /etc/opt/xsp/system.yml
                output dir : results/unknown/2021-11-04T06:08:37/tasks/basic
                args       :
INFO: test dcinfo started
INFO: loading config from /etc/opt/xsp/system.yml
INFO: found detector system SYS
INFO: found detector lambda: Lambda w/ 1 module(s)

          mod #1:
              FW: 2.2.0 [ctrl=v0, data=v3, feat=0x03]
              Chip IDs:
                  # 3: b32w2-D07,SAL,0x80040247
                  # 4: b32w2-E07,SAL,0x80040257
              HV: 201.046V
              T: 26.5°C 27.3347°C 242.562°C
              Features: HV 1/6-bit
INFO: test dcinfo finished
run: test passed


Re: Lambda 250K
Narayanan, Suresh <[email protected]>
Thu 11/4/2021 10:47 AM
To: Julian Schmehr <[email protected]>; Andreas Beckmann <[email protected]>;
Guruswamy, Tejas <[email protected]>; Dufresne, Eric <[email protected]>; Miceli, Antonino <[email protected]>;
Lang, Keenan C. <[email protected]>

Hi Julian,

I have given you access to “lapis” over NX. Once on NX, we will make sure it is logged as “8ideuser”,
then in a terminal, you can do “go_lambda” and will take you to the computer.

Eric is around and can help and maybe if Tejas is available.

Let us try to get some diagnosis if that is possible. We were quite stunned to see this issue and we had
users flying in from UCSD and no detector. Luckily, the pool was able to get a spare for us in a happy coincidence.

Suresh

Re: Lambda 250K
Guruswamy, Tejas <[email protected]>
Thu 11/4/2021 4:39 PM
To:

Narayanan, Suresh <[email protected]>; Julian Schmehr <[email protected]>; Andreas Beckmann <[email protected]>; Dufresne, Eric <[email protected]>; Miceli, Antonino <[email protected]>; Lang, Keenan C. <[email protected]>

Hi Julian,

Yes, the files we measured yesterday are in xenv/results/2021-11-03 .... Please take a look.

We did restart the hardware and software while testing, so I don't think it should be connected to that.

Here is a dcinfo output from while we were having problems, which seemed reasonable:
INFO: found detector lambda: Lambda w/ 1 module(s)
          mod #1:
              FW: 2.2.0 [ctrl=v0, data=v3, feat=0x03]
              Chip IDs:
                  # 3: b32w2-D07,SAL,0x80040247
                  # 4: b32w2-E07,SAL,0x80040257
              HV: 200.097V
              T: 27.3125°C 27.9499°C 304.75°C
              Features: HV 1/6-bit
Tejas

Re: Lambda 250K
Guruswamy, Tejas <[email protected]>
Thu 11/4/2021 5:07 PM
To: Julian Schmehr <[email protected]>; Narayanan, Suresh <[email protected]>; Andreas Beckmann <[email protected]>;
Dufresne, Eric <[email protected]>; Miceli, Antonino <[email protected]>; Lang, Keenan C. <[email protected]>

Hi Julian,

You are right, I must have made a mistake when visualizing in nexpy yesterday. Thank you for testing.

To be clear the syntax for using acquire is

./bin/run acquire -- {number of frames} {exposure in ms} {low threshold in keV},{high threshold in keV} {bit depth}

correct?

I just tested the IOC again however and still no luck.
So the detector seems fine after all, but we still have an issue. Perhaps Keenan could have a look?

Thanks
Tejas

Re: Lambda 250K
Narayanan, Suresh <[email protected]>
Thu 11/4/2021 5:13 PM
To: Guruswamy, Tejas <[email protected]>; Julian Schmehr <[email protected]>;
Andreas Beckmann <[email protected]>; Dufresne, Eric <[email protected]>;
Miceli, Antonino <[email protected]>; Lang, Keenan C. <[email protected]>

Hi Julian, Tejas,

How can ioc be wrong suddenly while running?

Are you sure Tejas that you saw noise yesterday, changed threshold in ioc and there was nothing in imagej.

Suresh

Re: Lambda 250K
Guruswamy, Tejas <[email protected]>
Thu 11/4/2021 5:30 PM
To: Narayanan, Suresh <[email protected]>; Julian Schmehr <[email protected]>; Andreas Beckmann <[email protected]>; Dufresne, Eric <[email protected]>; Miceli, Antonino <[email protected]>; Lang, Keenan C. <[email protected]>
Hi all,

I can't explain what went wrong yesterday when I was trying the acquire script. As Julian says,
reviewing the data now it looks ok, so I must have made a mistake somewhere, sorry.

But coming back to the IOC , yes even now I still see blank images in ImageJ. Changing threshold in the IOC
seems to make no difference. There are no obvious errors I can see, just no data.
(Also the IOC still crashes over ssh only.)

I am looking through the system logs in case I missed anything. From memory do you remember roughly when
it was last working? 11am Nov 3, or before?

Tejas

Re: Lambda 250K
Narayanan, Suresh <[email protected]>
Thu 11/4/2021 5:34 PM
To: Guruswamy, Tejas <[email protected]>; Julian Schmehr <[email protected]>; Andreas Beckmann <[email protected]>; Dufresne, Eric <[email protected]>; Miceli, Antonino <[email protected]>; Lang, Keenan C. <[email protected]>
Hi Tejas,

It was working around 11 am and ioc crashed at 11:25 am and then no photons on the chip.

Suresh

Re: Lambda 250K
Guruswamy, Tejas <[email protected]>
Thu 11/4/2021 6:43 PM
To: Narayanan, Suresh <[email protected]>; Julian Schmehr <[email protected]>; Andreas Beckmann <[email protected]>; Dufresne, Eric <[email protected]>; Miceli, Antonino <[email protected]>; Lang, Keenan C. <[email protected]>
Hi all,

I found some clues but no solution yet.

1. When the IOC crashes, this appears in the system log (this happened at 11:20AM Nov 3, and I can reproduce by trying to acquire over ssh now):
Nov 4 18:17:00 xspserver kernel: [107950.286099] ADLambda::expor[7153]: segfault at 10 ip 00007ffbafa4b0c0 sp 00007ff68d8e0828 error 4 in libCom.so.3.19.0[7ffbafa2c000+38000]
Nov 4 18:17:00 xspserver kernel: [107950.286110] Code: f7 01 00 be 84 00 00 00 48 8d 3d 99 f7 01 00 e8 e6 23 fe ff e9 fd fe ff ff 90 48 8b 7f 10 e9 47 28 fe ff 0f 1f 80 00 00 00 00 <48> 8b 7f 10 e9 67 12 fe ff 0f 1f 80 00 00 00 00 48 8b 7f 10 e9 87

Since it suggests a segfault in an epics library (libCom) I've tried to clean and rebuild everything -- epics, areaDetector, ADLambda.
But so far it still crashes. This issue is probably separate to all others since we had these crashes even before yesterday.

2. Looking at the output of the NDStatistics plugin, I think real data is being measured. The total across all pixels changes
with the IOC threshold in a reasonable way. However all image format outputs (ImageJ, JPEG, TIFF) look like empty data.

3. On Nov 2 Keenan said the following about a new version of the Lambda IOC: "I've switched the data types to be signed
values rather than unsigned." But everywhere I look in the image/output plugins windows I see UInt (unsigned). Not sure if this is correct.

4. Perhaps a vague theory is that Keenan's code change implemented on Nov 2 was not fully loaded initially, because the IOC
was already running and in memory on another user login. But when the IOC crashed at 11:20AM Nov 3, everything would have had
to reload, and this is when the problems started.
So I think the IOC is actually measuring data correctly but somehow one of the conversion steps (NDPluginStdArrays?) is
silently failing. We're at the boundary of my knowledge of epics internals so any insight from others would be appreciated.

Tejas

Re: Lambda 250K
Narayanan, Suresh <[email protected]>
Thu 11/4/2021 9:07 PM
To: Guruswamy, Tejas <[email protected]>; Julian Schmehr <[email protected]>;
Andreas Beckmann <[email protected]>; Dufresne, Eric <[email protected]>;
Miceli, Antonino <[email protected]>; Lang, Keenan C. <[email protected]>

Hi Tejas,

Thanks for digging deeper. Your explanation makes sense. Let us see what Keenan says.

Suresh

Keenan debugging:


[11/6 2:41 PM] Lang, Keenan C.
Had to track down a null-pointer error that was apparently there all along, but not causing any problems
until the current build, but I just ran an acquisition and was able to see static. You should double-check
that the images that are being taken match up with what is expected.

[11/6 3:02 PM] Narayanan, Suresh
Lang, Keenan C. can you explain what caused this, was Tejas' explanation correct?
&#8203;[11/6 3:09 PM] Lang, Keenan C.
I have no idea what caused this. Theoretically, this is something that should have been causing
an issue all along. I do think that the IOC was still running the old version for a bit and then when
something did go wrong, rebooting the IOC just failed worse, but I even rolled back to the version of
the code from before I made any recent changes and still had crashes, so I am unsure what all went wrong.
ELOG V3.1.4-395e101