While work was being done on the green, apparently the red side was disrupted. The archon power appeared off at the power strip (0W being drawn) and no temperatures were being reported. Investigation revealed that the archon switch was on, power cord was in, but chassis was cold (consistent with power strip reading 0W). The archon box was then power cycled (using switch on archon) and that seems to have brought it up. Cables run over the green, so they got wiggled during green work, so maybe a bad cable connection.
Eventually power cycling and restarting software brought it back online, but the thermal control was inactive for several hours.
The Green detector ceased communicating at about 5am HST. Similar to other events, the CCDPOWER keyword was non responsive, but restarting the service did not help. We power cycled the archon itself from the power strip.
Will was able to view camerad
logs:
poweron ERROR Archon controller returned error processing command: POWERON poweroff ERROR Archon controller returned error processing command: POWEROFF
Furthermore, the camerad log (/kroot/var/log/ccd_green/camerad_2023)0519.log) is full of stuff like this:
2023-05-19T17:38:43.481123 (Camera::doit) thread 3 received command on fd 7: sensor 1 C AVG 2023-05-19T17:38:43.481150 (Archon::Interface::sensor) ERROR: module 1 is not a heater board
That is, it doesn't even know there's a heater board at module 1. Also at startup, camerad logged that the "APPLYALL" Archon command failed.
Furthermore, the Archon system command returns sensible-looking values for the backplane itself, but about each individual 'module' it says:
SYSTEM:MOD1_TYPE=16 SYSTEM:MOD1_REV=0 SYSTEM:MOD1_VERSION=0.0.0 SYSTEM:MOD1_ID=0000000000000000
It repeats the above for MOD1 through MOD12
At about 9:15am, modify -s kpfgreen TEMPSET=-120
and modify -s kpfgreen STAT="Ramp Run"
to prevent the CCD from overcooling.
The solution in this case was to make some changes on the archon boards which were performed the following day (2023-05-20). The detector temperature returned to nominal around 3pm on 2023-05-20.
It looks like the problem lay in one of the optional boards in the Archon itself (this was likely the cause of the slew of recent green side problems). We disabled the +/-100V supplies based on instructions from STA.
Similar to other recent events
During troubleshooting of green, we twice set kpfred.CF_BASE_ENA
to on as instructed by EMIR notices, but off is apparently the nominal state. This caused temperature excursion, but small as it was caught early.
Grafana plot: http://vm-dashboards.keck.hawaii.edu/d/VoSaQfx4l/kpf-red-thermal-control?orgId=1&from=1683763200000&to=1683784800000
Similar to other recent events where CCDPOWER was off.
Green side Grafana plot: http://vm-dashboards.keck.hawaii.edu/d/VoSaQfx4k/kpf-green-thermal-control?orgId=1&from=1683763200000&to=1683784800000
Similar to 2023-04-29 at 13:55 HST: KPF Green Detector deltaT=-4.9 we had a temperature excursion when the CCDPOWER keyword got in to a bad state, then temperature set points came back with incorrect values after kpfgreen
was restarted.
Keck Grafana link: http://vm-dashboards.keck.hawaii.edu/d/VoSaQfx4k/kpf-green-thermal-control?orgId=1&from=1683716400000&to=1683752400000
Similar to 2023-04-29 at 13:55 HST: KPF Green Detector deltaT=-4.9 we had a temperature excursion when the CCDPOWER keyword got in to a bad state, then temperature set points came back with incorrect values after kpfgreen
was restarted. This time, the difference was noticed (and alarmed on by kpfmon
right away, so the excursion was corrected promptly.
Keck Grafana instance link to relevant timestamp: http://vm-dashboards.keck.hawaii.edu/d/VoSaQfx4k/kpf-green-thermal-control?from=1683675600000&to=1683684000000
After the Instec pumps had increasing trouble keeping the system cold (the instec cooling power would saturate frequently), impacts began to be seen on the Green detector temperature.
The first temperature change on the detector began on 2023-04-18 at 08:27 HST, the system was back under control at about 12:15 HST the same day, but reached a +1.2 degree delta_T.
A second temperature excursion began on 2023-04-19 at 11:20 HST and ended at 17:00 HST the same day with a delta_T of +5.4 degrees.
The system was warmed and purged to resolve the problem. The warm up began on 2023-04-20 at 00:00 HST and returned to normal operation on 2023-04-21 at 18:00 HST.
Full timespan in Grafana: http://vm-dashboards.keck.hawaii.edu/d/VoSaQfx4k/kpf-green-thermal-control?orgId=1&from=1681822800000&to=1682157600000
The green detector went in to an odd state. The first symptom was the inability to take an exposure. A look at keywords revealed that CCDPOWER was not on:
2023-04-19T09:13:41.0630 CCDPOWER = On 2023-04-29T13:55:12.6123 CCDPOWER = Unknown 2023-04-29T13:55:22.6552 CCDPOWER = Not Configured
This timestamp (2023-04-29T13:55:12.6123
) corresponds to when the detector temperature stopped reporting. Here’s a snapshot of relevant keywords:
2023-04-21T14:12:38.6366 TEMPSET = -140.0000 degC 2023-04-28T23:42:30.5979 EXPOSE = Trigger 2023-04-29T11:35:49.3206 EXPSTATE = Ready 2023-04-29T13:55:02.5943 STA_CCD_T = -100.000 degC 2023-04-29T13:55:12.6123 CCDPOWER = Unknown 2023-04-29T13:55:12.6337 STA_CCD_T = -100.001 degC 2023-04-29T13:55:22.6552 CCDPOWER = Not Configured 2023-04-29T13:55:22.6758 STA_CCD_T = -273.150 degC 2023-04-29T15:35:25.8610 EXPSTATE = Ready 2023-04-29T15:35:25.8610 CCDPOWER = Unknown 2023-04-29T15:35:25.8610 EXPOSE = Ready 2023-04-29T15:35:25.9069 STA_CCD_T = 0.000 degC 2023-04-29T15:35:30.2153 CCDPOWER = Not Configured 2023-04-29T15:35:30.2370 STA_CCD_T = -273.150 degC 2023-04-29T15:35:30.9103 TEMPSET = -140.0000 degC 2023-04-29T15:35:40.2146 CCDPOWER = Off 2023-04-29T15:35:40.2252 STA_CCD_T = -104.902 degC 2023-04-29T15:35:50.3138 STA_CCD_T = -104.903 degC 2023-04-29T15:36:00.3748 STA_CCD_T = -104.903 degC 2023-04-29T15:36:10.4220 STA_CCD_T = -104.903 degC 2023-04-29T15:36:20.4941 STA_CCD_T = -104.902 degC 2023-04-29T15:36:26.3987 CCDPOWER = Mixed 2023-04-29T15:36:26.4196 STA_CCD_T = -104.904 degC 2023-04-29T15:36:30.5592 CCDPOWER = On 2023-04-29T15:36:30.5813 STA_CCD_T = -104.904 degC
This is a screen shot of the kpfgreen.STA_CCD_T
values on Grafana.
And a link to the green detector page (on the Keck Grafana instance) for that same time stamp http://vm-dashboards.keck.hawaii.edu/d/VoSaQfx4k/kpf-green-thermal-control?orgId=1&from=1682787133000&to=1682983336000
Here’s a link to the Slack discussion: https://cal-planet-search.slack.com/archives/C041G07ULG0/p1682818477220269
The problem wasn’t noticed until May 1. Will Deich noted:
kpfmon says that many heater components are off or at 0. Not sure if that's real so I'm commanding them to their nominal settings.
Despite all heater controls apparently now correct, the STA CCD temp is continuing to run cold (-102.8, s/b -100.0, hasn't improved in the past 8 minutes).
The erroneous keyword values began when the kpfgreen service was restarted, at 15:35 on Saturday 29 Apr.
Before camerad restarted, it logged that the Archon controller had returned an error processing the POWERON command.Upon restart, camerad recorded that it sent the correct configuration values to the controller. Perhaps ktlcamerad has a race condition in how it starts communicating with camerad, if they are both restarted at the same time, and perhaps that led to the incorrect reported values. I'll look at that later.and lastly, the STA detector temp is headed towards -100.0