Blog from May, 2023

While work was being done on the green, apparently the red side was disrupted. The archon power appeared off at the power strip (0W being drawn) and no temperatures were being reported. Investigation revealed that the archon switch was on, power cord was in, but chassis was cold (consistent with power strip reading 0W).  The archon box was then power cycled (using switch on archon) and that seems to have brought it up.  Cables run over the green, so they got wiggled during green work, so maybe a bad cable connection.

Eventually power cycling and restarting software brought it back online, but the thermal control was inactive for several hours.

The Green detector ceased communicating at about 5am HST. Similar to other events, the CCDPOWER keyword was non responsive, but restarting the service did not help. We power cycled the archon itself from the power strip.

Will was able to view camerad logs:

poweron 
ERROR Archon controller returned error processing command: POWERON
poweroff
ERROR Archon controller returned error processing command: POWEROFF

Furthermore, the camerad log (/kroot/var/log/ccd_green/camerad_2023)0519.log) is full of stuff like this:

2023-05-19T17:38:43.481123 (Camera::doit) thread 3 received command on fd 7: sensor 1 C AVG 
2023-05-19T17:38:43.481150 (Archon::Interface::sensor) ERROR: module 1 is not a heater board

That is, it doesn't even know there's a heater board at module 1. Also at startup, camerad logged that the "APPLYALL" Archon command failed.

Furthermore, the Archon system command returns sensible-looking values for the backplane itself, but about each individual 'module' it says:

SYSTEM:MOD1_TYPE=16
SYSTEM:MOD1_REV=0
SYSTEM:MOD1_VERSION=0.0.0
SYSTEM:MOD1_ID=0000000000000000

It repeats the above for MOD1 through MOD12

At about 9:15am, modify -s kpfgreen TEMPSET=-120 and modify -s kpfgreen STAT="Ramp Run" to prevent the CCD from overcooling.

The solution in this case was to make some changes on the archon boards which were performed the following day (2023-05-20). The detector temperature returned to nominal around 3pm on 2023-05-20.

It looks like the problem lay in one of the optional boards in the Archon itself (this was likely the cause of the slew of recent green side problems). We disabled the +/-100V supplies based on instructions from STA.

During troubleshooting of green, we twice set kpfred.CF_BASE_ENA to on as instructed by EMIR notices, but off is apparently the nominal state. This caused temperature excursion, but small as it was caught early.

Grafana plot: http://vm-dashboards.keck.hawaii.edu/d/VoSaQfx4l/kpf-red-thermal-control?orgId=1&from=1683763200000&to=1683784800000

Similar to other recent events where CCDPOWER was off.

Green side Grafana plot: http://vm-dashboards.keck.hawaii.edu/d/VoSaQfx4k/kpf-green-thermal-control?orgId=1&from=1683763200000&to=1683784800000

Similar to 2023-04-29 at 13:55 HST: KPF Green Detector deltaT=-4.9 we had a temperature excursion when the CCDPOWER keyword got in to a bad state, then temperature set points came back with incorrect values after kpfgreen was restarted.

Keck Grafana link: http://vm-dashboards.keck.hawaii.edu/d/VoSaQfx4k/kpf-green-thermal-control?orgId=1&from=1683716400000&to=1683752400000

Similar to 2023-04-29 at 13:55 HST: KPF Green Detector deltaT=-4.9 we had a temperature excursion when the CCDPOWER keyword got in to a bad state, then temperature set points came back with incorrect values after kpfgreen was restarted. This time, the difference was noticed (and alarmed on by kpfmon right away, so the excursion was corrected promptly.

Keck Grafana instance link to relevant timestamp: http://vm-dashboards.keck.hawaii.edu/d/VoSaQfx4k/kpf-green-thermal-control?from=1683675600000&to=1683684000000

After the Instec pumps had increasing trouble keeping the system cold (the instec cooling power would saturate frequently), impacts began to be seen on the Green detector temperature.

The first temperature change on the detector began on 2023-04-18 at 08:27 HST, the system was back under control at about 12:15 HST the same day, but reached a +1.2 degree delta_T.

A second temperature excursion began on 2023-04-19 at 11:20 HST and ended at 17:00 HST the same day with a delta_T of +5.4 degrees.

The system was warmed and purged to resolve the problem. The warm up began on 2023-04-20 at 00:00 HST and returned to normal operation on 2023-04-21 at 18:00 HST.

Full timespan in Grafana: http://vm-dashboards.keck.hawaii.edu/d/VoSaQfx4k/kpf-green-thermal-control?orgId=1&from=1681822800000&to=1682157600000

The green detector went in to an odd state. The first symptom was the inability to take an exposure. A look at keywords revealed that CCDPOWER was not on:

2023-04-19T09:13:41.0630     CCDPOWER = On
2023-04-29T13:55:12.6123     CCDPOWER = Unknown
2023-04-29T13:55:22.6552     CCDPOWER = Not Configured

This timestamp (2023-04-29T13:55:12.6123) corresponds to when the detector temperature stopped reporting. Here’s a snapshot of relevant keywords:

2023-04-21T14:12:38.6366      TEMPSET = -140.0000 degC
2023-04-28T23:42:30.5979       EXPOSE = Trigger
2023-04-29T11:35:49.3206     EXPSTATE = Ready
2023-04-29T13:55:02.5943    STA_CCD_T = -100.000 degC
2023-04-29T13:55:12.6123     CCDPOWER = Unknown
2023-04-29T13:55:12.6337    STA_CCD_T = -100.001 degC
2023-04-29T13:55:22.6552     CCDPOWER = Not Configured
2023-04-29T13:55:22.6758    STA_CCD_T = -273.150 degC
2023-04-29T15:35:25.8610     EXPSTATE = Ready
2023-04-29T15:35:25.8610     CCDPOWER = Unknown
2023-04-29T15:35:25.8610       EXPOSE = Ready
2023-04-29T15:35:25.9069    STA_CCD_T = 0.000 degC
2023-04-29T15:35:30.2153     CCDPOWER = Not Configured
2023-04-29T15:35:30.2370    STA_CCD_T = -273.150 degC
2023-04-29T15:35:30.9103      TEMPSET = -140.0000 degC
2023-04-29T15:35:40.2146     CCDPOWER = Off
2023-04-29T15:35:40.2252    STA_CCD_T = -104.902 degC
2023-04-29T15:35:50.3138    STA_CCD_T = -104.903 degC
2023-04-29T15:36:00.3748    STA_CCD_T = -104.903 degC
2023-04-29T15:36:10.4220    STA_CCD_T = -104.903 degC
2023-04-29T15:36:20.4941    STA_CCD_T = -104.902 degC
2023-04-29T15:36:26.3987     CCDPOWER = Mixed
2023-04-29T15:36:26.4196    STA_CCD_T = -104.904 degC
2023-04-29T15:36:30.5592     CCDPOWER = On
2023-04-29T15:36:30.5813    STA_CCD_T = -104.904 degC

This is a screen shot of the kpfgreen.STA_CCD_T values on Grafana.

And a link to the green detector page (on the Keck Grafana instance) for that same time stamp http://vm-dashboards.keck.hawaii.edu/d/VoSaQfx4k/kpf-green-thermal-control?orgId=1&from=1682787133000&to=1682983336000

Here’s a link to the Slack discussion: https://cal-planet-search.slack.com/archives/C041G07ULG0/p1682818477220269

The problem wasn’t noticed until May 1. Will Deich noted:

kpfmon says that many heater components are off or at 0.  Not sure if that's real so I'm commanding them to their nominal settings.

Despite all heater controls apparently now correct, the STA CCD temp is continuing to run cold (-102.8, s/b -100.0, hasn't improved in the past 8 minutes).

The erroneous keyword values began when the kpfgreen service was restarted, at 15:35 on Saturday 29 Apr.

Before camerad restarted, it logged that the Archon controller had returned an error processing the POWERON command.Upon restart, camerad recorded that it sent the correct configuration values to the controller.  Perhaps ktlcamerad has a race condition in how it starts communicating with camerad, if they are both restarted at the same time, and perhaps that led to the incorrect reported values.  I'll look at that later.and lastly, the STA detector temp is headed towards -100.0