General Principles
DO NOT use
control-c
to stop a script, this can leave the instrument in an unsafe state. Usekpfconfig.SCRIPTSTOP
to request a stop (also available as a button in the KPF OB GUI).
Log File Locations
Relevant log files for KPF are located in the following locations:
Services and dispatchers write their logs to the
/usr/local/kroot/var/log
directory on the server on which they run. Most services run onkpfserver
, but FIU related services run onkpffiuserver
for example.A shortcut to that directory is available on the command line:
cdlog
The KPF Translator writes logs to the data disk:
/s/sdata1701/KPFTranslator_logs
A shortcut to that directory is available on the command line:
cdtlog
Within that directory, all KPF translator log lines are written to:
KPFTranslator.log
Also within that directory are date directories such as
2023jul01
which contain logs which are written by high level scripts such asRunCalOB
. The log lines in these files are duplicates of what is in theKPFTranslator.log
file in the directory above, but are duplicated here for easier searching and examination. It is also an easy way to see what high level scripts were run on a particular night.
The command line interface for all of the translators writes to the date directories in
/s/sdata1701/KPFTranslator_logs
in acli_logs
subdirectory of the date. For example:/s/sdata1701/KPFTranslator_logs/2023jul01/cli_logs/cli_interface.log
The KPF OB GUI writes logs to the data disk in
/s/sdata1701/KPFTranslator_logs/OB_GUI.log
Scripts
Existing script is running
Symptom
A script fails on the command line with a message similar to:
kpf.FailedPreCondition: Failed PreCondition: Existing script RunCalOB.py (3303940) is running. If the offending script is not running (PID not listed in ps) then the script keywords can be cleared by running: reset_script_keywords or invoking it from the FVWM background menu: KPF Trouble Recovery --> Reset script keywords
This may sometimes be seen at night if a scheduled calibration happens to be running when the science observer is trying to control the instrument (e.g. running the start of night script or running an observation). Observing takes precedence over calibrations, so if this is the case, the calibration can be stopped as described below.
Problem
The keyword used to track whether a script is running indicate that another script is currently using the system.
Solution1: Another script is running and needs to be stopped
If another script is running and you need to terminate it in order to start something else, you should use the kpfconfig.SCRIPTSTOP
keyword. You can do this by setting the SCRIPTSTOP keyword to “Yes” either on the command line:
modify -s kpfconfig SCRIPTSTOP=Yes
or via the KPF OB GUI’s STOP button which performs the same action:
Setting SCRIPTSTOP to “Yes” will request that the running script terminate in an orderly fashion. This may take several minutes depending on when the running script reaches a sensible breakpoint. It is important to use SCRIPTSTOP to halt a script (instead of hitting ctrl-c
) because cleanup actions will be performed after a SCRIPTSTOP (e.g. turning off laps to preserve their lifetime).
One of the things which can delay the running script terminating is a long exposure in progress. You can halt an exposure in progress by requesting kpfexpose to end the current exposure:
modify -s kpfexpose EXPOSE=End
Once the exposure is complete and has read out, most scripts will then check SCRIPTSTOP and begin the termination and cleanup steps.
Solution 2: No other scripts are running
Sometimes the script keywords will be set as if a script is running, but the script has crashed and is not actually running. In this case, other scripts will be blocked until the script keywords are cleared. This can be done from the FWWM background menu using the KPF Trouble Recovery --> Reset script keywords
entry or from the command line by invoking reset_script_keywords
.
Agitator Use is Disabled
Symptom
When executing the start of night script, the log shows as warning:
2023-05-06 00:22:27,449 WARNING: Agitator use is disabled for tonight
Problem
This means that the kpfconfig.USEAGITATOR
keyword is set to “No”. This keyword is meant to indicate the mechanism’s health. A WMKO staff member will set this keyword to No if the agitator mechanism is not functional for some reason.
Solution
The agitator can be reenabled by simply setting the keyword to “Yes”. This should only be done by WMKO staff and should only be done if the agitator is fully functional. A broken or misbehaving agitator mechanism presents a significant danger to the science fibers.
Calibrations
Calibration Source is Not Working
Symptom
A calibration set is run, but does not take data for a lamp.
Problem
The lamp may be disabled in kpfconfig
. The log may contain a line similar to:
2023-05-06 00:22:27,449 WARNING: Cal source EtalonFiber is disabled
Solution
The cal source has likely been disabled for a reason. Reenabling it should only be done by WMKO staff with knowledge of why it was disabled in the first place.
SlewCal or Simultaneous Calibration Source is Wrong
Symptom
With the simultaneous calibration source (which is printed to the start of night log) or the slew cal source (visible in the OB GUI) are not the desired value.
Problem
These two sources are set from the same place and are always kept to the same value using the kpfconfig.SIMULCALSOURCE
keyword.
Solution
This value should only be changed by WMKO staff. The choice of calibration source is influenced by our desire to maintain the lifetime of the various lamps and calibration sources and by the need for warmup times on certain sources. This is not something the user should adjust, contact a WMKO Staff Astronomer if you feel it is set incorrectly.
Etalon Light Source is Off
Symptom
There is no light visible from the etalon in spectra.
This can be confirmed visually looking at the fibers coming from the SuperK light source which feeds the etalon. Under normal conditions, glow can be seen at some fiber interfaces.
Problem
The Super-K light source is off or otherwise incapacitated.
Solution
Note: A safety issue is that the super-K puts out considerable power, including in the infrared. The optical output is connected to the etalon so this won’t be a problem. But one shouldn’t disconnect that fiber and look into the beam. It is exceptionally bright in the optical so this is probably obvious, but I mention it just in case.
Check that the SuperK is powered on at the PDU (port M7 available via the Power GUI, or gshow -s kpfpower OUTLET_M7
).
If the system is powered on, check that the interlock lights are green as seen in the photo below. The interlock is just two very thin wires that I twisted together.
Check that the key is in the On position (see photo below)
If these indicate that the status is ok, then the SuperK may need to be powered on from software. There is currently no keyword based client for this and this must be done from a Windows machine running the NKT Control software. There is a small Windows machine (tc-su-kpfetalon
or 10.104.10.157
) connected via USB which is accessible via VNC. VNC in as user kpfeng
.
Once connected via VNC, the Control software should be available on the desktop. Start that and connect to the device. Once connected, the system should look like the screenshot below when in an operational state. The power on the light, use the “Emission ON/OFF” button.
Guider
Target is not Visible in Guider
Symptom
The observer’s target is not visible on the guider after slewing there. This could be due to one of several problems. Often this is related to the small field of view (FoV) of the guider which is approximately 30x40 arc seconds.
Problem 1: Telescope Pointing
The telescope pointing is off (Ca or Ce is wrong) or the star is a high proper motion star and is not at the expected coordinates.
Solution 1
The OA should check a nearby, bright pointing star. If it centers up nicely, this suggests that either the coordinates of the target are wrong (or simply not corrected for proper motions). The star may be just off the edge of the field, so the OA can check by making a few small telescope moves to probe the area just off the edge of the field.
Problem 2: Guider Sensitivity
The star is too faint for the current guider camera settings or the extinction from clouds has made the star too faint to detect.
Solution 2
If it is not already, increase the guider camera’s gain setting to high and see if the star appears. This can be accomplished either in the tip tilt GUI or via command line: kpfSetGuiderGain high
.
If the target star is faint in J band magnitude (e.g. Jmag > 12) (Note that 2MASS is a good source to determine expected J magnitude), then the guider may need to have the frames per second (FPS) value lowered to increase sensitivity. If conditions are good (i.e. little or no extinction from clouds), then there is a tool which can set reasonable guesses for the guider gain and FPS based on the J magnitude of the star. For example: kpfPredictGuiderParameters 13.2
(for Jmag=13.2) will set both guider gain and FPS.
If there is substantial extinction from clouds, the user may need to manually configure the guider parameters. This is best done by editing the observing block. In the KPF OB GUI, this can be accomplished by setting the GuiderMode to manual, and setting the gain and FPS below (see screenshot).
Alternatively the observer may set these manually using the command line tools kpfSetGuiderGain
and kpfSetGuiderFPS
, but if an OB is executed it will overwrite the previous values, so to use the new values for science, they must be copied to the OB prior to execution.
Problem 3
The light is not reaching the guide camera due to an obstruction.
Solution 3
Look at the FIU GUI to see if there is an obstruction. Possible culprits include:
The AO hatch is closed: check keywords or FIU GUI. The hatch can be opened using the command line
kpfControlAOHatch open
or from the FVWM menu: KPF Utilities → Open AO HatchThe FIU hatch is closed: check keywords or FIU GUI. The hatch can be opened using the command line
kpfControlHatch open
or from the FVWM menu: KPF Utilities → Open FIU HatchThe FIU is in the wrong mode: check keywords or FIU GUI. The FIU may be in calibration mode or in an unknown state, either of which can result in the calibration fold mirror blocking the light. The mode can be set using the command line
kpfConfigureFIU Observing
or from the FVWM menu: KPF Utilities → Set FIU Mode to Observing
FIU
FIU Mode Change Fails
Symptom
The FIU failed to reach destination mode (kpffiu.MODE
) after repeated attempts.
One possible error code which you might see is:
decode_write_response_event(): MODE (on behalf of MODE): ERR_WRITE_SW_ERROR (-5401) There was an error in the device-specific write routine for this keyword: check the log files
Problem
One possible reason for this is that the fold mirror is in a limit switch. To establish if this is the case:
gshow -s kpffiu FOLDLIM FOLDLIM = Positive
Normal state is FOLDLIM=”None”
Solution
Reset and home the fold mirror stage
modify -s kpffiu FOLDCAL=Reset modify -s kpffiu FOLDCAL=Homed
Then either set FOLDNAM to the destination or set MODE to the desired destination.
Tip Tilt Performance is Poor
Symptom
Tip tilt performance is poor. The star is moving a lot on the guider, perhaps primarily in either the X or Y pixel direction.
Problem
One possibility is that the XY stage is failing. The symptom is that the stage has lost range and can not reach the commanded position. To determine if this is the case, stop observing and stop the tip tilt loops and:
Watch the
kpffiu
TTXVAX
andTTYVAX
keywords (e.g. via anxshow
)Command one of the axis to someplace near the nominal limit and watch to see if the keyword value reached the location it was commanded to go. The X stage should be able to go from -15 to +15 and the Y stage should be able to go -20 to +20 (they actually can access a bit more, but this is a reasonable amount to test). For example
modify -s kpffiu TTXVAX=-15
(note where the keyword actually goes)modify -s kpffiu TTXVAX=15
(note where the keyword actually goes)modify -s kpffiu TTYVAX=-20
(note where the keyword actually goes)modify -s kpffiu TTYVAX=20
(note where the keyword actually goes)
If one or more of the above tests falls far short of the destination, then we need to tell the software not to use that part of the range.
It is worth repeating the test above to see if you get consistent results. If the results are not consistent, then the procedure below will be of limited utility.
Solution
There are several components to this fix and it is important to understand what each does, so that you can perform the fix properly. The section below explains how the keywords are used, then we will discuss how to set them.
Understanding Tip Tilt Keywords
The
kpffiu
service handles commands to the tip tilt stage.TTXVAX
andTTYVAX
report the position of the stage in arc seconds of tilt. This is tilt of the stage, not movement of the star on the focal plane.TTXRAW
andTTYRAW
are the position of the stage in raw encoder counts.TTXRON
andTTYRON
define the home position of the stage for the stage controller in raw encoder counts.
The
kpfguide
service handles the control loop used to decide where to command the stage to go.TIPTILT_HOME
defines the (x, y) coordinates of the home position in arc seconds of motion of the stage. This is where the stage will be positioned if no tip tilt moves are needed – it is the center of the stage motion.TIPTILT_XRANGE
andTIPTILT_YRANGE
define the maximum range of motion which the guide system should use during observing (in units of arc seconds of stage motion). This is the half width of the range, so the stage can go from the home position - range to home position + range.
The combination of kpfguide.TIPTILT_HOME
and kpffiu.TT{x,y}RON
define the center of motion of the stage and we want those to be in agreement. Unfortunately, these are in different units, so we need to keep that in mind and ensure they are consistent when we set new values.
Once the center position is defined, the system uses kpfguide.TIPTILT_{x,y}RANGE
to decide how far to push the tip tilt mirror before offloading to the telescope.
Setting Up the Tip Tilt Center Position and Limits
Above we discovered the limits (in TT{x,y}VAX
units) where the stage seems to be able to go. Use those values to determine the home position (center point between the limits) and the range (distance from home to one limit). Use those calculated values to set kpfguide.TIPTILT_HOME
and kpfguide.TIPTILT_{x,y}RANGE
.
Now move the stage to the home position by setting kpffiu.TT{x,y}VAX
and read off the kpffiu.TT{x,y}RAW
value. Update the kpffiu.TT{x,y}RON
strings with that value, maintaining the proper formatting: |{raw value}|0|Home
.
There is an experimental script to do the above steps, but you must manually verify that the results look reasonble. Run kpfMeasureTipTiltMirrorRange
and it will print out a set of modify commands for you to run (it does not execute them itself). Check the logic and math, the script is experimental! Note that if the script finds a limit which is unexpected (i.e. less range than nominal), it will set up the above keywords to a range 1 arc second smaller than where the limit is. It adds this safety margin to the kpfguide.TIPTILT_{x,y}RANGE
keywords, so if you calculate something slightly different than the script, this may be why.
Detectors
kpfexpose.EXPOSE is not Ready
Symptom
The kpfexpose.EXPOSE
keyword is reporting an unexpected state such as reporting "InProgress" even though the exposure should have finished or is reporting "Error".
If kpfexpose.EXPOSE
is not ready, it will not allow users to take new exposures which is how this will likely manifest.
Problem
One of the detectors may not be in a normal state.
Use the EXPLAINNR
keyword to determine which detector is blocking kpfexpose.EXPOSE
from becoming "Ready". For example, if kpfexpose.EXPLAINNR
is reporting "hk:ACQPHASE=wait"
, that indicates that the CaHK detector is not in the Ready phase. The detailed solution will depend on which camera and what is causing the blockage.
Solution1: Reset the Red or Green detector
If either the Red or Green camera is the problem as reported by kpfexpose.EXPLAINNR or by the relevant camera's EXPSTATE keyword, the setting the particular camera's EXPOSE keyword to "Reset" may resolve things (e.g. modify -s kpfgreen EXPOSE=Reset
).
Solution2: Abort Ca HK Exposure
If the Ca HK detector appears to be the problem (e.g. kpfexpose.EXPLAINNR = "hk:ACQPHASE=wait"), then aborting the exposure in progress may free up the Ca HK detector. To do so run: modify -s kpf_hk EXPOSE=abort
Solution3: Restart the relevant keyword service or dispatchers
If the above has not helped, restarting the service may free things up. Possible services to restart include kpfexpose, kpfgreen, kpfred, kpf_hk. It is suggested that you only restart the service if there is some indication that it is the source of the problem (either from kpfexpose.EXPLAINNR or from the service's EXPSTATE for example).
A Detector Refuses to be Triggered
Symptom
When running an OB, or invoking command line scripts, a particular detector is not set to be triggered even though it is requested.
Problem
The detector may be disabled in kpfconfig
. The log may contain a line similar to:
2023-05-06 00:22:27,449 WARNING: Green detector is not enabled
Solution
The detector has likely been disabled for a reason. Reenabling it should only be done by WMKO staff with knowledge of why it was disabled in the first place.
Detector Fails to Trigger Without Error
Symptom
The detector is not exposing and not producing output files, but kpfexpose
is unaware of the problem and takes exposures as normal triggering all selected cameras.
Problem
Something is stuck in the dispatcher code for driving the Archon (this only applies to red and green detectors).
Solution
Restart dispatcher 0 on the affected detector: kpf restart kpfgreen0
Green or Red CCDPOWER is “Not Configured”
Symptom
The CCDPOWER keyword in the kpfgreen or kpfred service is “Not Configured”. This may manifest as the system not being able to take exposures.
Problem
The core problem is not yet clear, but this appears to be something in the Archon controller itself. Investigation is ongoing as of this writing. “It is as if the Archon glitched and lost its ACF.”
Solution
Restart the relevant kpfgreen or kpfred service (we will use kpfgreen in this example):
kpf restart kpfgreen
After the restart, the heater configuration must immediately be reset to the nominal values to restore temperature control, this will also turn on CCDPOWER:
kpfmon-heater-copy green copytolive
Ca HK Detector Stuck
Symptom
The HK detector is stuck in the exposing state.
or
The kpfexpose EXPOSE status is stuck and kpfexpose.EXPLAINNR
is "hk:ACQPHASE=wait".
Problem
Some combination of hardware and software is stuck.
Solution
Abort the existing HK exposure:
modify -s kpf_hk expose=abort
Alternatively, the same action is available via the FWWM menu under KPF Troubleshooting Menu → KPF Trouble Recovery → Reset Ca HK Detector
Then restart the kpf_hk service:
kpf restart kpf_hk
Take test exposures to see if the system has recovered.
If the system is still sticking after test exposures:
Power cycle the Andor Camera, and the HK Galil on power strip J, ports 1, 2, and 5, then restart kpf_hk again. This can be accomplished in one step using:
kpfPowerCycleCaHK
Alternatively, the same tool is available via the FWWM menu under KPF Troubleshooting Menu → KPF Trouble Recovery → Power Cycle Ca HK Detector
Exposure Meter in Error State
Symptom
The exposure meter is in an error state: kpf_expmeter.EXPSTATE=Error
Problem
Something in the SBIG CCD control software is unhappy.
Solution 1
First, try resetting the exposure meter detector:
modify -s kpf_expmeter EXPOSE=Reset
Check the status: gshow -s kpf_expmeter EXPSTATE
should become “Ready”.
Solution 2
If the above solution fails to recover the system, power cycle the camera on power port L3.
Then do kpf restart kpf_expmeter
This is equivalent to
kpf restart kpf_expmeter1
and kpf restart kpf_expmeter2
kpf_expmeter1 is the exposure meter camera.
kpf_expmeter2 is the exposure meter DRP.
Sometimes kpf_expmeter2 is stuck in a busy state after a kpf_expmeter1 restart. Both kpf_expmeter1 and kpf_expmeter2 need to be ready before kpfexpose can trigger new exposures.
You may need to set the TOP, LEFT, WIDTH, and HEIGHT keywords after restarting the software. The values are:
[kpfeng@kpfserver] /kroot/var/log > gshow -s kpf_expmeter LEFT TOP WIDTH HEIGHT LEFT = 1 TOP = 0 WIDTH = 651 HEIGHT = 300
The correct values are checked in testall.py
, so you can always run that to verify values are ok.
L0 Generation
Deprecated L0 files
L0 files are generated by gathering the “sub-FITS” files from the Green, Red, CaHK, EM, and guide cameras, processing some of them (EM and guide camera), adding telemetry drawn from KTL keywords and the sub-FITS headers to the new L0 FITS header, and writing the L0 file. This process happens automatically by using keywords to notify a dispatcher that a recent exposure is ready for L0 file creation. If this dispatcher is stopped or dies, it can be restarted with the command kpf restart assemble
. If L0 files need to be generated manually (e.g., because the dispatcher stopped or L0 files need to be regenerated), it can be done with the command
l0_assemble --outdir path NNNNN MMMMM
where path
is the directory where the L0 files should be written (e.g., /sdata1701/kpfeng/DATE/L0
), NNNNN is the first FRAMENO (a header keyword for Green and Red FITS images and an index which is incremented with each exposure; e.g. 33015
), and MMMMM is the last FRAMENO in a range of files (e.g., 33020
). The final argument is optional if only one L0 file needs to be generated.
L0 files may be deprecated when switching to fast-readout mode. The observers may find broken L1/L2 files, and the L0 files are missing green/red/CaHK components. If L0 files needed to be regenerated, it is better to write to a new directory (e.g., /sdata1701/kpfeng/DATE/new_L0
), and inform Jeff Mader so he can ingest the new L0 files to Keck Observatory Archive.
Vacuum
Vacuum Chamber vacuum levels rising
Symptom
Vacuum levels in the chamber are rising, but the vacuum levels at the pump are falling. These are kpfvac.VCH_HIVAC
and kpfvac.VCART_HIVAC
keywords respectively.
This might manifest as a “vac chamber trouble” alert from kpfmon.
Problem
The gate valve between the vacuum chamber and the pump has closed.
The gate valve is currently (late 2023) not instrumented and is controlled by compressed air from the facility. As long as facility compressed air is working, the valve is open and the pump should be keeping the vacuum chamber at good vacuum levels. If the gate valve closes, it is presumably because compressed air has failed.
Solution
Restore compressed air.
Note that compressed air depends on HELCO power. If we’re on generator, it is not active. It should come back on it’s own once power is restored.
SoCal
Enclosure Won’t Open or Close
Symptom
Enclosure will not move. It may begin moving, then stop and reverse.
Problem
Enclosure motor is hitting an overcurrent limit.
To verify this is the problem, log in to the dome controller from kpfeng@kpfserver
:
ssh socal
This connects to the Raspberry Pi controller in the dome enclosure. The username and IP address has configured in the ~/.ssh/config
file on kpfserver (you can ssh manually using pi@192.168.23.244
) and the SSH key for kpfserver has been installed on the Pi, so it should not ask for a password, but if it does, the password in in the usual showpasswords
location.
View the dome log file in the ~/dome.log
and look for errors which indicate the nature of the problem.
Solution
If the log file indicates overcurrent on the motors is the issue. Ensure the mechanisms are clear of obstruction and reasonably well balanced (it doesn’t need to be perfect).
Enclosure Won’t Open
Symptom
Enclosure will not move. It may begin moving, then stop and reverse.
Problem
Enclosure motor is not getting current.
To verify this is the problem, log in to the dome controller from kpfeng@kpfserver
:
ssh socal
This connects to the Raspberry Pi controller in the dome enclosure. The username and IP address has configured in the ~/.ssh/config
file on kpfserver (you can ssh manually using pi@192.168.23.244
) and the SSH key for kpfserver has been installed on the Pi, so it should not ask for a password, but if it does, the password in in the usual showpasswords
location.
View the dome log file in the ~/dome.log
and look for errors similar to:
2024-07-09 19:09:37,752 WARNING Operation timed out (35.0 secs), max measured motor current: 0.0 A.
Solution
Reboot the controller (raspberry pi): sudo reboot