KPF: Pre-Observing | Observing | Post-Observing

Troubleshooting

General Principles

  • DO NOT use control-c to stop a script, this can leave the instrument in an unsafe state. Use kpfconfig.SCRIPTSTOP to request a stop (also available as a button in the KPF OB GUI).

Log File Locations

Relevant log files for KPF are located in the following locations:

  • Services and dispatchers write their logs to the /usr/local/kroot/var/log directory on the server on which they run. Most services run on kpfserver, but FIU related services run on kpffiuserver for example.

    • A shortcut to that directory is available on the command line: cdlog

  • The KPF Translator writes logs to the data disk: /s/sdata1701/KPFTranslator_logs

    • A shortcut to that directory is available on the command line: cdtlog

    • Within that directory, all KPF translator log lines are written to: KPFTranslator.log

    • Also within that directory are date directories such as 2023jul01 which contain logs which are written by high level scripts such as RunCalOB. The log lines in these files are duplicates of what is in the KPFTranslator.log file in the directory above, but are duplicated here for easier searching and examination. It is also an easy way to see what high level scripts were run on a particular night.

  • The command line interface for all of the translators writes to the date directories in /s/sdata1701/KPFTranslator_logs in a cli_logs subdirectory of the date. For example: /s/sdata1701/KPFTranslator_logs/2023jul01/cli_logs/cli_interface.log

  • The KPF OB GUI writes logs to the data disk in /s/sdata1701/KPFTranslator_logs/OB_GUI.log

Scripts

Existing script is running

Symptom

A script fails on the command line with a message similar to:

kpf.FailedPreCondition: Failed PreCondition: Existing script RunCalOB.py (3303940) is running. If the offending script is not running (PID not listed in ps) then the script keywords can be cleared by running: reset_script_keywords or invoking it from the FVWM background menu: KPF Trouble Recovery --> Reset script keywords

This may sometimes be seen at night if a scheduled calibration happens to be running when the science observer is trying to control the instrument (e.g. running the start of night script or running an observation). Observing takes precedence over calibrations, so if this is the case, the calibration can be stopped as described below.

Problem

The keyword used to track whether a script is running indicate that another script is currently using the system.

Solution1: Another script is running and needs to be stopped

If another script is running and you need to terminate it in order to start something else, you should use the kpfconfig.SCRIPTSTOP keyword. You can do this by setting the SCRIPTSTOP keyword to “Yes” either on the command line:

modify -s kpfconfig SCRIPTSTOP=Yes

or via the KPF OB GUI’s STOP button which performs the same action:

The KPF OB GUI’s script control section.

Setting SCRIPTSTOP to “Yes” will request that the running script terminate in an orderly fashion. This may take several minutes depending on when the running script reaches a sensible breakpoint. It is important to use SCRIPTSTOP to halt a script (instead of hitting ctrl-c) because cleanup actions will be performed after a SCRIPTSTOP (e.g. turning off laps to preserve their lifetime).

One of the things which can delay the running script terminating is a long exposure in progress. You can halt an exposure in progress by requesting kpfexpose to end the current exposure:

modify -s kpfexpose EXPOSE=End

Once the exposure is complete and has read out, most scripts will then check SCRIPTSTOP and begin the termination and cleanup steps.

Solution 2: No other scripts are running

Sometimes the script keywords will be set as if a script is running, but the script has crashed and is not actually running. In this case, other scripts will be blocked until the script keywords are cleared. This can be done from the FWWM background menu using the KPF Trouble Recovery --> Reset script keywords entry or from the command line by invoking reset_script_keywords.

Agitator Use is Disabled

Symptom

When executing the start of night script, the log shows as warning:

2023-05-06 00:22:27,449 WARNING: Agitator use is disabled for tonight

Problem

This means that the kpfconfig.USEAGITATOR keyword is set to “No”. This keyword is meant to indicate the mechanism’s health. A WMKO staff member will set this keyword to No if the agitator mechanism is not functional for some reason.

Solution

The agitator can be reenabled by simply setting the keyword to “Yes”. This should only be done by WMKO staff and should only be done if the agitator is fully functional. A broken or misbehaving agitator mechanism presents a significant danger to the science fibers.

Calibrations

Calibration Source is Not Working

Symptom

A calibration set is run, but does not take data for a lamp.

Problem

The lamp may be disabled in kpfconfig. The log may contain a line similar to:

2023-05-06 00:22:27,449 WARNING: Cal source EtalonFiber is disabled

Solution

The cal source has likely been disabled for a reason. Reenabling it should only be done by WMKO staff with knowledge of why it was disabled in the first place.

SlewCal or Simultaneous Calibration Source is Wrong

Symptom

With the simultaneous calibration source (which is printed to the start of night log) or the slew cal source (visible in the OB GUI) are not the desired value.

Problem

These two sources are set from the same place and are always kept to the same value using the kpfconfig.SIMULCALSOURCE keyword.

Solution

This value should only be changed by WMKO staff. The choice of calibration source is influenced by our desire to maintain the lifetime of the various lamps and calibration sources and by the need for warmup times on certain sources. This is not something the user should adjust, contact a WMKO Staff Astronomer if you feel it is set incorrectly.

Etalon Light Source is Off

Symptom

There is no light visible from the etalon in spectra.

This can be confirmed visually looking at the fibers coming from the SuperK light source which feeds the etalon. Under normal conditions, glow can be seen at some fiber interfaces.

Problem

The Super-K light source is off or otherwise incapacitated.

Solution

Note: A safety issue is that the super-K puts out considerable power, including in the infrared.  The optical output is connected to the etalon so this won’t be a problem.  But one shouldn’t disconnect that fiber and look into the beam.  It is exceptionally bright in the optical so this is probably obvious, but I mention it just in case.

Check that the SuperK is powered on at the PDU (port M7 available via the Power GUI, or gshow -s kpfpower OUTLET_M7).

If the system is powered on, check that the interlock lights are green as seen in the photo below. The interlock is just two very thin wires that I twisted together.

Check that the key is in the On position (see photo below)

The NKT SuperK light source for the etalon. Note the on/off key, and the interlock lights are in their operating position in this photo.

If these indicate that the status is ok, then the SuperK may need to be powered on from software. There is currently no keyword based client for this and this must be done from a Windows machine running the NKT Control software. There is a small Windows machine (tc-su-kpfetalon or 10.104.10.157) connected via USB which is accessible via VNC. VNC in as user kpfeng.

Once connected via VNC, the Control software should be available on the desktop. Start that and connect to the device. Once connected, the system should look like the screenshot below when in an operational state. The power on the light, use the “Emission ON/OFF” button.

Guider

Target is not Visible in Guider

Symptom

The observer’s target is not visible on the guider after slewing there. This could be due to one of several problems. Often this is related to the small field of view (FoV) of the guider which is approximately 30x40 arc seconds.

Problem 1: Telescope Pointing

The telescope pointing is off (Ca or Ce is wrong) or the star is a high proper motion star and is not at the expected coordinates.

Solution 1

The OA should check a nearby, bright pointing star. If it centers up nicely, this suggests that either the coordinates of the target are wrong (or simply not corrected for proper motions). The star may be just off the edge of the field, so the OA can check by making a few small telescope moves to probe the area just off the edge of the field.

Problem 2: Guider Sensitivity

The star is too faint for the current guider camera settings or the extinction from clouds has made the star too faint to detect.

Solution 2

If it is not already, increase the guider camera’s gain setting to high and see if the star appears. This can be accomplished either in the tip tilt GUI or via command line: kpfSetGuiderGain high.

If the target star is faint in J band magnitude (e.g. Jmag > 12) (Note that 2MASS is a good source to determine expected J magnitude), then the guider may need to have the frames per second (FPS) value lowered to increase sensitivity. If conditions are good (i.e. little or no extinction from clouds), then there is a tool which can set reasonable guesses for the guider gain and FPS based on the J magnitude of the star. For example: kpfPredictGuiderParameters 13.2 (for Jmag=13.2) will set both guider gain and FPS.

If there is substantial extinction from clouds, the user may need to manually configure the guider parameters. This is best done by editing the observing block. In the KPF OB GUI, this can be accomplished by setting the GuiderMode to manual, and setting the gain and FPS below (see screenshot).

Alternatively the observer may set these manually using the command line tools kpfSetGuiderGain and kpfSetGuiderFPS, but if an OB is executed it will overwrite the previous values, so to use the new values for science, they must be copied to the OB prior to execution.

Problem 3

The light is not reaching the guide camera due to an obstruction.

Solution 3

Look at the FIU GUI to see if there is an obstruction. Possible culprits include:

  • The AO hatch is closed: check keywords or FIU GUI. The hatch can be opened using the command line kpfControlAOHatch open or from the FVWM menu: KPF Utilities → Open AO Hatch

  • The FIU hatch is closed: check keywords or FIU GUI. The hatch can be opened using the command line kpfControlHatch open or from the FVWM menu: KPF Utilities → Open FIU Hatch

  • The FIU is in the wrong mode: check keywords or FIU GUI. The FIU may be in calibration mode or in an unknown state, either of which can result in the calibration fold mirror blocking the light. The mode can be set using the command line kpfConfigureFIU Observing or from the FVWM menu: KPF Utilities → Set FIU Mode to Observing

FIU

FIU Mode Change Fails

Symptom

The FIU failed to reach destination mode (kpffiu.MODE) after repeated attempts.

One possible error code which you might see is:

decode_write_response_event(): MODE (on behalf of MODE): ERR_WRITE_SW_ERROR (-5401) There was an error in the device-specific write routine for this keyword: check the log files

Problem

One possible reason for this is that the fold mirror is in a limit switch. To establish if this is the case:

gshow -s kpffiu FOLDLIM FOLDLIM = Positive

Normal state is FOLDLIM=”None”

Solution

Reset and home the fold mirror stage

Then either set FOLDNAM to the destination or set MODE to the desired destination.

Tip Tilt Performance is Poor

Symptom

Tip tilt performance is poor. The star is moving a lot on the guider, perhaps primarily in either the X or Y pixel direction.

Problem

One possibility is that the XY stage is failing. The symptom is that the stage has lost range and can not reach the commanded position. To determine if this is the case, stop observing and stop the tip tilt loops and:

  • Watch the kpffiu TTXVAX and TTYVAX keywords (e.g. via an xshow)

  • Command one of the axis to someplace near the nominal limit and watch to see if the keyword value reached the location it was commanded to go. The X stage should be able to go from -15 to +15 and the Y stage should be able to go -20 to +20 (they actually can access a bit more, but this is a reasonable amount to test). For example

    • modify -s kpffiu TTXVAX=-15 (note where the keyword actually goes)

    • modify -s kpffiu TTXVAX=15 (note where the keyword actually goes)

    • modify -s kpffiu TTYVAX=-20 (note where the keyword actually goes)

    • modify -s kpffiu TTYVAX=20 (note where the keyword actually goes)

  • If one or more of the above tests falls far short of the destination, then we need to tell the software not to use that part of the range.

It is worth repeating the test above to see if you get consistent results. If the results are not consistent, then the procedure below will be of limited utility.

Solution

There are several components to this fix and it is important to understand what each does, so that you can perform the fix properly. The section below explains how the keywords are used, then we will discuss how to set them.

Understanding Tip Tilt Keywords

  • The kpffiu service handles commands to the tip tilt stage.

    • TTXVAX and TTYVAX report the position of the stage in arc seconds of tilt. This is tilt of the stage, not movement of the star on the focal plane.

    • TTXRAW and TTYRAW are the position of the stage in raw encoder counts.

    • TTXRON and TTYRON define the home position of the stage for the stage controller in raw encoder counts.

  • The kpfguide service handles the control loop used to decide where to command the stage to go.

    • TIPTILT_HOME defines the (x, y) coordinates of the home position in arc seconds of motion of the stage. This is where the stage will be positioned if no tip tilt moves are needed – it is the center of the stage motion.

    • TIPTILT_XRANGE and TIPTILT_YRANGE define the maximum range of motion which the guide system should use during observing (in units of arc seconds of stage motion). This is the half width of the range, so the stage can go from the home position - range to home position + range.

The combination of kpfguide.TIPTILT_HOME and kpffiu.TT{x,y}RON define the center of motion of the stage and we want those to be in agreement. Unfortunately, these are in different units, so we need to keep that in mind and ensure they are consistent when we set new values.

Once the center position is defined, the system uses kpfguide.TIPTILT_{x,y}RANGE to decide how far to push the tip tilt mirror before offloading to the telescope.

Setting Up the Tip Tilt Center Position and Limits

Above we discovered the limits (in TT{x,y}VAX units) where the stage seems to be able to go. Use those values to determine the home position (center point between the limits) and the range (distance from home to one limit). Use those calculated values to set kpfguide.TIPTILT_HOME and kpfguide.TIPTILT_{x,y}RANGE.

Now move the stage to the home position by setting kpffiu.TT{x,y}VAX and read off the kpffiu.TT{x,y}RAW value. Update the kpffiu.TT{x,y}RON strings with that value, maintaining the proper formatting: |{raw value}|0|Home.

There is an experimental script to do the above steps, but you must manually verify that the results look reasonble. Run kpfMeasureTipTiltMirrorRange and it will print out a set of modify commands for you to run (it does not execute them itself). Check the logic and math, the script is experimental! Note that if the script finds a limit which is unexpected (i.e. less range than nominal), it will set up the above keywords to a range 1 arc second smaller than where the limit is. It adds this safety margin to the kpfguide.TIPTILT_{x,y}RANGE keywords, so if you calculate something slightly different than the script, this may be why.

Detectors

kpfexpose.EXPOSE is not Ready

Symptom

The kpfexpose.EXPOSE keyword is reporting an unexpected state such as reporting "InProgress" even though the exposure should have finished or is reporting "Error".

If kpfexpose.EXPOSE is not ready, it will not allow users to take new exposures which is how this will likely manifest.

Problem

One of the detectors may not be in a normal state.

Use the EXPLAINNR keyword to determine which detector is blocking kpfexpose.EXPOSE from becoming "Ready". For example, if kpfexpose.EXPLAINNR is reporting "hk:ACQPHASE=wait", that indicates that the CaHK detector is not in the Ready phase. The detailed solution will depend on which camera and what is causing the blockage.

Solution1: Reset the Red or Green detector

If either the Red or Green camera is the problem as reported by kpfexpose.EXPLAINNR or by the relevant camera's EXPSTATE keyword, the setting the particular camera's EXPOSE keyword to "Reset" may resolve things (e.g. modify -s kpfgreen EXPOSE=Reset).

Solution2: Abort Ca HK Exposure

If the Ca HK detector appears to be the problem (e.g. kpfexpose.EXPLAINNR = "hk:ACQPHASE=wait"), then aborting the exposure in progress may free up the Ca HK detector. To do so run: modify -s kpf_hk EXPOSE=abort

Solution3: Restart the relevant keyword service or dispatchers

If the above has not helped, restarting the service may free things up. Possible services to restart include kpfexpose, kpfgreen, kpfred, kpf_hk. It is suggested that you only restart the service if there is some indication that it is the source of the problem (either from kpfexpose.EXPLAINNR or from the service's EXPSTATE for example).

A Detector Refuses to be Triggered

Symptom

When running an OB, or invoking command line scripts, a particular detector is not set to be triggered even though it is requested.

Problem

The detector may be disabled in kpfconfig. The log may contain a line similar to:

2023-05-06 00:22:27,449 WARNING: Green detector is not enabled

Solution

The detector has likely been disabled for a reason. Reenabling it should only be done by WMKO staff with knowledge of why it was disabled in the first place.

Detector Fails to Trigger Without Error

Symptom

The detector is not exposing and not producing output files, but kpfexpose is unaware of the problem and takes exposures as normal triggering all selected cameras.

Problem

Something is stuck in the dispatcher code for driving the Archon (this only applies to red and green detectors).

Solution

Restart dispatcher 0 on the affected detector: kpf restart kpfgreen0

Green or Red CCDPOWER is “Not Configured”

Symptom

The CCDPOWER keyword in the kpfgreen or kpfred service is “Not Configured”. This may manifest as the system not being able to take exposures.

Problem

The core problem is not yet clear, but this appears to be something in the Archon controller itself. Investigation is ongoing as of this writing. “It is as if the Archon glitched and lost its ACF.”

Solution

Restart the relevant kpfgreen or kpfred service (we will use kpfgreen in this example):

kpf restart kpfgreen

After the restart, the heater configuration must immediately be reset to the nominal values to restore temperature control, this will also turn on CCDPOWER:

kpfmon-heater-copy green copytolive

Ca HK Detector Stuck

Symptom

The HK detector is stuck in the exposing state.

or

The kpfexpose EXPOSE status is stuck and kpfexpose.EXPLAINNR is "hk:ACQPHASE=wait".

Problem

Some combination of hardware and software is stuck.

Solution

  1. Abort the existing HK exposure:

modify -s kpf_hk expose=abort

Alternatively, the same action is available via the FWWM menu under KPF Troubleshooting Menu → KPF Trouble Recovery → Reset Ca HK Detector

  1. Then restart the kpf_hk service:

kpf restart kpf_hk

Take test exposures to see if the system has recovered.


If the system is still sticking after test exposures:

  1. Power cycle the Andor Camera, and the HK Galil on power strip J, ports 1, 2, and 5, then restart kpf_hk again. This can be accomplished in one step using:

kpfPowerCycleCaHK

Alternatively, the same tool is available via the FWWM menu under KPF Troubleshooting Menu → KPF Trouble Recovery → Power Cycle Ca HK Detector

Exposure Meter in Error State

Symptom

The exposure meter is in an error state: kpf_expmeter.EXPSTATE=Error

Problem

Something in the SBIG CCD control software is unhappy.

Solution 1

First, try resetting the exposure meter detector:

modify -s kpf_expmeter EXPOSE=Reset

Check the status: gshow -s kpf_expmeter EXPSTATE should become “Ready”.

Solution 2

If the above solution fails to recover the system, power cycle the camera on power port L3.

Then do kpf restart kpf_expmeter

This is equivalent to

kpf restart kpf_expmeter1 and kpf restart kpf_expmeter2
kpf_expmeter1 is the exposure meter camera.
kpf_expmeter2 is the exposure meter DRP.

Sometimes kpf_expmeter2 is stuck in a busy state after a kpf_expmeter1 restart. Both kpf_expmeter1 and kpf_expmeter2 need to be ready before kpfexpose can trigger new exposures.

You may need to set the TOP, LEFT, WIDTH, and HEIGHT keywords after restarting the software. The values are:

The correct values are checked in testall.py, so you can always run that to verify values are ok.

 

L0 Generation

Deprecated L0 files

L0 files are generated by gathering the “sub-FITS” files from the Green, Red, CaHK, EM, and guide cameras, processing some of them (EM and guide camera), adding telemetry drawn from KTL keywords and the sub-FITS headers to the new L0 FITS header, and writing the L0 file. This process happens automatically by using keywords to notify a dispatcher that a recent exposure is ready for L0 file creation. If this dispatcher is stopped or dies, it can be restarted with the command kpf restart assemble. If L0 files need to be generated manually (e.g., because the dispatcher stopped or L0 files need to be regenerated), it can be done with the command

where path is the directory where the L0 files should be written (e.g., /sdata1701/kpfeng/DATE/L0), NNNNN is the first FRAMENO (a header keyword for Green and Red FITS images and an index which is incremented with each exposure; e.g. 33015), and MMMMM is the last FRAMENO in a range of files (e.g., 33020). The final argument is optional if only one L0 file needs to be generated.

L0 files may be deprecated when switching to fast-readout mode. The observers may find broken L1/L2 files, and the L0 files are missing green/red/CaHK components. If L0 files needed to be regenerated, it is better to write to a new directory (e.g., /sdata1701/kpfeng/DATE/new_L0 ), and inform Jeff Mader so he can ingest the new L0 files to Keck Observatory Archive.

Agitator

Agitator Sounds Wrong or Speed is Wrong

Symptom

The agitator is not working as expected. This is usually seen in the agitator speed value (see screenshots) or by listening to the sounds.

 

You can listen to the agitator mechanism via the “KPF crypt M5075” camera on the facility camera list (note this is an internal web page at Keck). In normal operation the agitator makes a regular (roughly 1-2 Hz) mechanical oscillation sound. When the bad behavior above occurred it was either silent (note there is background fan noise on that camera) or would make a single mechanical “cachunk” sound, then stop.

Problem

The motor is not initialized properly.

Solution

Initialize using:

modify -s kpfmot agitmod=pos

modify -s kpfmot agitini=no

This leaves the system in Halt mode. Initialize using:

modify -s kpfmot agitmod=pos
modify -s kpfmot agitini=yes

Vacuum

Vacuum Chamber vacuum levels rising

Symptom

Vacuum levels in the chamber are rising, but the vacuum levels at the pump are falling. These are kpfvac.VCH_HIVAC and kpfvac.VCART_HIVAC keywords respectively.

This might manifest as a “vac chamber trouble” alert from kpfmon.

Problem

The gate valve between the vacuum chamber and the pump has closed.

The gate valve is currently (late 2023) not instrumented and is controlled by compressed air from the facility. As long as facility compressed air is working, the valve is open and the pump should be keeping the vacuum chamber at good vacuum levels. If the gate valve closes, it is presumably because compressed air has failed.

Solution

Restore compressed air.

Note that compressed air depends on HELCO power. If we’re on generator, it is not active. It should come back on it’s own once power is restored.

 

SoCal

Enclosure Won’t Open or Close

Symptom

Enclosure will not move. It may begin moving, then stop and reverse.

Problem

Enclosure motor is hitting an overcurrent limit.

To verify this is the problem, log in to the dome controller from kpfeng@kpfserver :

ssh socal

This connects to the Raspberry Pi controller in the dome enclosure. The username and IP address has configured in the ~/.ssh/config file on kpfserver (you can ssh manually using pi@192.168.23.244) and the SSH key for kpfserver has been installed on the Pi, so it should not ask for a password, but if it does, the password in in the usual showpasswords location.

View the dome log file in the ~/dome.log and look for errors which indicate the nature of the problem.

The ~/grep_for_dome_error script will exclude many of the noisy, not useful lines in the dome.log file and help with examining the log.

Solution

If the log file indicates overcurrent on the motors is the issue. Ensure the mechanisms are clear of obstruction and reasonably well balanced (it doesn’t need to be perfect).

 

Enclosure Won’t Open

Symptom

Enclosure will not move. It may begin moving, then stop and reverse.

Problem

Enclosure motor is not getting current.

To verify this is the problem, log in to the dome controller from kpfeng@kpfserver :

ssh socal

This connects to the Raspberry Pi controller in the dome enclosure. The username and IP address has configured in the ~/.ssh/config file on kpfserver (you can ssh manually using pi@192.168.23.244) and the SSH key for kpfserver has been installed on the Pi, so it should not ask for a password, but if it does, the password in in the usual showpasswords location.

View the dome log file in the ~/dome.log and look for errors similar to:

The ~/grep_for_dome_error script will exclude many of the noisy, not useful lines in the dome.log file and help with examining the log.

Solution

Reboot the controller (raspberry pi): sudo reboot and restart the kpfsocal3 dispatcher: kpf restart kpfsocal3.