Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Every Keck instrument has a testAll shellscript to quickly check instrument and server health. It is used by the day crew and SAs as a diagnostic and troubleshooting tool. The script reads telemetries and reports warnings and errors. We will use the HIRES testAll as an example to build the KPF testAll.

Please note that the HIRES testAll script was written in perl which is not standard procedure now. Modern implementations use python and we should do the same.Download HIRES testAll and output files here. Scroll down to see the script and output texts

Here is a KPF mockup testAll in python, based on the list of alarms we are developing. It will produce similar outputs as the HIRES testAll (see below) to report KPF system health.

View file
nametestAll.output
View file
nametestAll

testAll

...

languagebash

...

_KPFmockup

Code Block
breakoutModewide
languagepy
#! /kroot/rel/default/bin/kpython3
description = """
testAll -- check KPF 

...

systems 

...

health

...


...

purpose:

...

 

...

 

...

 

...

 

...

  Test all of the 

...

KPF systems to determine whether they are

...

      working properly.

...

 The 

...

script checks:

...

      - Check dispatchers 

...

health

...

 

...

 

...

 

...

 

...

 

...

 

...

- Check 

...

aux 

...

rack 

...

temperature

...

 

...

 

...

 

...

 

...

 

...

 

...

- Check 

...

vaccume 

...


...

 

...

 

...

 

...

 

...

 

...

 

...

Systems 

...

to 

...

check:

...

 

...

 

...

 

...

 

...

 

...

 

...

- 

...

green 

...

instect 

...

dispatcher

...

 

...

 

...

 

...

 

...

 

...

 

...

- 

...

vaccum 

...

pump 

...

dispatcher

...

 

...

 

...

 

...

 

...

 

...

 

...

- 

...

green 

...

instect 

...

coldhead 

...

temperature
      - 

...

green ln2 

...

dewar 

...

weight

...

 

...

 

...

 

...

 

...

 

...

 

...

-

...

 vaccum pump bearing temperature 
      - vaccum pump frequency 
      - vaccume chamber pressure 

      Keywords to check: 
      - kpfgreen.d1
      - kpfvac.d2
      - kpfgreen.currtemp
      - kpflab.greenweight
      - kpfvac.pump_temp
      - kpfvac.pump_freq
      - kpfvac.vch_hivac

      GUIs to check:
      - FIU status GUI
      - Tip Tilt system GUI
      - KPF Status GUI


      With no input argument, all systems are checked.

Exit values:
       0 = normal completion
      <0 = warnings but no errors
      >0 = error

example:
      1) To check instrument status:
      	testAll

"""
epilog="""
history:

Jan-27-2022     SY     Adapted from DEIMOS testAll for KPF

"""

###########################
## Import Python modules ##
###########################

import argparse as arps
from argparse import RawTextHelpFormatter
import sys, os
import ktl
import subprocess
import shlex
import re

######################
## Define constants ##
######################

parser = arps.ArgumentParser(description=description, epilog=epilog, \
              

...

 

...

 

...

 

...

 

...

  

...

 

...

 

...

  

...

  

...

 

...

 

...

 formatter_class=RawTextHelpFormatter, add_help=True)
parser.add_argument('-s', '--systems', help='Systems which should be tested.', \
             

...

 

...

 

...

 

...

 

...

 

...

 

...

 nargs='+', default='all')

args = parser.parse_args()

SYSTEMS = args.systems

if args.systems == 'all':
    systems = ['computers', 'keywords', 'guis']
else:
    systems = args.systems

STATUS = {'ok': 'OK', 'warning': 'WARNING!', 'error': 'ERROR!'}

COMPUTERS = {'kpfserver': 'Instrument target host', \
             'kpf': 'Linux virtual host'}

GUIS = {'FIU Status': ('FIU Status Display', 'kpfserver'), \
        'Tip Tilt Status': ('Tip Tilt Status Display', 'kpfserver'), \
        'KPF Status': ('KPF Status Display', 'kpfserver'), \
        'tklogger': ('Events logger', 'kpfserver'), \
        'pig_main': ('Program Identification GUI', 'kpf')}

KEYWORDS = {'green instect dispatcher': ('kpfgreen', ['D1']), \
          'vaccum pump dispatcher': ('kpfvac', ['D2'])}

HEALTH = {green instect coldhead temperature': ('kpfgreen', 'CURRTEMP',['-145'],'warning'), \
          'green ln2 dewar weight': ('kpflab', 'GREENWEIGHT',['52.0 54.0'],'warning'), \
          'vaccum pump bearing temperature': ('kpfvac', 'PUMP_TEMP',['55.0 60.0'],'warning'), \
          'vaccum pump frequency': ('kpfvac', 'PUMP_FREQ', ['990 1100'],'warning'), \
          'vaccume chamber pressure': ('kpfvac', 'VCH_HIVAC', ['3.0e-5'],'warning')}

#telescope wrap
ROTCCWLM = int(-188)
ROTCWLM = int(188)          

######################
## Define functions ##
######################

def main(config):

    exit_code = 0

    # Verify host

    if os.environ['HOST'] != 'KPF':
        print('This command must be run on

...

 vm-KPF.')
        sys.exit(exit_code)

    # Check systems

    errors_total = 0
    warnings_total = 0 

    if 'computers' in systems:
        ecomp, wcomp = check_computers(COMPUTERS, STATUS)
        errors_total = errors_total + ecomp
        warnings_total = warnings_total + wcomp
        
    if 'guis' in systems:
       

...

 eapps, wapps = check_apps(GUIS, STATUS)
    

...

 

...

 

...

  errors_total = errors_total + eapps

...

 

...

 

...

 

...

 

...

 

...

   warnings_total = warnings_total + wapps

    if 'keywords' in systems:
   

...

     

...

ekeyw, wkeyw 

...

= check_keywords(KEYWORDS, STATUS)
        errors_total = errors_total + ekeyw
        warnings_total = warnings_total + wkeyw

    no_errors_or_warnings = 'All KPF systems and computers are healthy'
    n = len(no_errors_or_warnings)
    errors_and_warnings = f'{errors_total:2d} errors and {warnings_total:2d} warnings were issued'
    m = len(errors_and_warnings)

    if ( errors_total == 0 and warnings_total == 0):
        print('-'*n)
        print(no_errors_or_warnings)
        print('-'*n)
  

...

 

...

 else:
      

...

  

...

print('-'*m)
        print(errors_and_warnings)
  

...

 

...

 

...

 

...

   print('-'*m)
  

...

   

...

   
    

...

if errors_total > 0:
        exit_code 

...

= errors_total
  

...

 

...

 

...

elif warnings_total > 0:
    

...

   

...

 exit_code = -warnings_total
    else:
  

...

 

...

     exit_code 

...

=

...

 0

...

    return exit_code

#-----------------#
# Check 

...

computers 

...

#
#-----------------#

def check_computers(computer_dict, status):
    """
    Check communications with KPF computers.
    """
    
    n_errors = 0
    n_warnings = 0

    print('Checking KPF computers:')
    for computer in sorted(computer_dict.keys()):
        computer_status = status['error']
        cmd = shlex.split('ping -c 1 '+computer)
        process = subprocess.Popen(cmd, stdout=subprocess.PIPE, \
                                   stderr=subprocess.PIPE)
        stdout, stderr = process.communicate()
        return_code = process.poll()
        if return_code == 0:
            computer_status = status['ok']
        else:
            n_errors = n_errors + 1
        print (f'  Checking {computer.ljust(24,"."):24}{computer_status:8} ({computer_dict[computer]})')

    return n_errors, n_warnings


#------------#
# Check GUIs #
#------------#

def check_apps(app_list, status):
    """
    Check user applications.
    """
    
    n_errors = 0
    n_warnings = 0

    print('Checking KPF GUIs:')

    # List all processes on kpfserver
    
    cmd = shlex.split('kpf status')
    
    process = subprocess.Popen(cmd, stdout=subprocess.PIPE, \
                               stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    KPF_status_output = stdout.decode('utf-8').split('\n')
    
    
    all_apps_check = []
    for line in KPF_status_output:
        if len(line) > 0:
            if (line.split()[0][0:4] == 'nirc'):
                all_apps_check.append(line)

    # List all processes on vm-kpf

    cmd = shlex.split('ps -ef')
    process = subprocess.Popen(cmd, stdout=subprocess.PIPE, \
                               stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    all_ps_check = stdout.decode('utf-8').split('\n')

    for app in sorted(app_list):

        computer = app_list[app][1]

        app_status = status['warning']

        if computer == 'kpfserver':

            for process in all_apps_check:
                app_in_process = re.search(app, process)
                if app_in_process != None:
                    app_status = status['ok']
            
        else:

            for process in all_ps_check:
                app_in_process = re.search(app, process)
                if app_in_process != None:
                    app_status = status['ok']

        if app_status == status['warning']:
            app_status = status['warning'] + ' (ignore if observers not currently using KPF)'
            n_warnings = n_warnings + 1

        print (f'  Checking {app.ljust(24,"."):24}{app_status:8}')
        
    return n_errors, n_warnings

#----------------#
# Check keywords #
#----------------#

def check_keywords(keyword_dict, status):
    """
    Check keyword health
    """
    
    n_errors = 0
    n_warnings = 0

    print('Checking KPF keyword libraries:')
    for key in sorted(keyword_dict.keys()):
        if keyword_dict[key][1] == 'acs' or keyword_dict[key][1] == 'dcs':
            keyword_status = status['warning']
        else:
            keyword_status = status['error']
        ktl_error = False
        for keyword_check in keyword_dict[key][1]:
            try:
                keyword = ktl.cache(keyword_dict[key][0], keyword_check)
                keyword.monitor()
                keyword_status = status['ok']
            except:
                ktl_error = True
                keyword_status = status['error']

        if keyword_status == status['warning']:
                n_warnings = n_warnings + 1
        if keyword_status == status['error']:
                n_errors = n_errors + 1
        print (f'  Checking {key.ljust(24,"."):24}{keyword_status:8}')

    return n_errors, n_warnings

#----------------#
# Check health   #
#----------------#

def check_settings(settings_dict, status):
    """
    Check KPF health
    """
    
    n_errors = 0
    n_warnings = 0

    print('Checking KPF health:')
    for key in sorted(settings_dict.keys()):
        setting_status = status[settings_dict[key][3]]
        ktl_error = False
        out_of_range = True
        try:
            keyword = ktl.cache(settings_dict[key][0], settings_dict[key][1])
            keyword.monitor()
            if len(settings_dict[key][2]) == 2 and \
               (float(keyword.ascii) >= float(settings_dict[key][2][0])) and \
               (float(keyword.ascii) <= float(settings_dict[key][2][1])):
                out_of_range = False
                setting_status = status['ok']
            if (len(settings_dict[key][2]) == 1) and (keyword.ascii == settings_dict[key][2][0]):
                out_of_range = False
                setting_status = status['ok']
        except:
            ktl_error = True
            setting_status = status[settings_dict[key][3]]

        if setting_status == status['warning']:
                n_warnings = n_warnings + 1
        if setting_status == status['error']:
                n_errors = n_errors + 1

        if ktl_error:
            print (f'  Checking {key.ljust(55,"."):55}{setting_status:8} not reachable')
        else: 
            if out_of_range:
                if len(settings_dict[key][2]) == 1:
                    print (f'  Checking {key.ljust(55,"."):55}{setting_status:8} Current value {keyword.ascii} should be {settings_dict[key][2][0]}.')
                if len(settings_dict[key][2]) == 2:
                    

...

if 

...

setting_status 

...

=

...

=

...

 status['warning']:
   

...

 

...

  

...

        

...

 

...

 

...

 

...

 

...

   

...

 

...

 

...

 

...

if (settings_dict[key][1] == 'ROTCCWLM') or (settings_dict[key][1] == 'ROTCWLM'):
                            print (f'  Checking {key.ljust(55,"."):55}{setting_status:8} (ignore if KPF is not selected in TCS) Current value {float(keyword.ascii):.1f} outside good range [{settings_dict[key][2][0]:.1f}, {settings_dict[key][2][1]:.1f}].')
                        else:
                            print (f'  Checking {key.ljust(55,"."):55}{setting_status:8} Current value {float(keyword.ascii):.1f} outside good range [{settings_dict[key][2][0]:.1f}, {settings_dict[key][2][1]:.1f}].')
                    else:
                        print (f'  Checking {key.ljust(55,"."):55}{setting_status:8} Current value {float(keyword.ascii):.1f} outside good range [{settings_dict[key][2][0]:.1f}, {settings_dict[key][2][1]:.1f}].')
            else:
                print (f'  Checking {key.ljust(55,"."):55}{setting_status:8}')

    return n_errors, n_warnings



##################
## Main program ##
##################


if __name__ == '__main__':
    
    main(systems)
             
sys.exit(0)

...

HIRES testAll is a perl script, scroll down to see the output texts.

View file
nametestAll.output
View file
nametestAll

HIRES testAll output

Checking HIRES computers:

Checking hccdvmep................OK (HIRES CCD crate)

Checking hmotcrate...............OK (HIRES motor crate)

Checking tserver1................OK (Lantronix)

Checking HIRES daemons:

Checking dispatcher.hinfo........OK

Checking dispatcher2.............OK

Checking infoman.................OK

Checking lickserv2...............OK

Checking traffic.................OK

Checking watch_ccd...............OK

Checking watch_expo..............OK

Checking watch_hirot_monitor.....OK

Checking HIRES applications:

Checking xhires..................ERROR! got 0 instances

Checking ds9.....................ERROR! got 0 instances

Checking ds9relay................ERROR! got 0 instances

Checking hires_dashboard.........ERROR! got 0 instances

Checking hexpo_dashboard.........WARNING! got 0 instances

Checking write_image.............ERROR! got 0 instances

Checking HIRES keyword libraries:

Checking CCD+infopatcher.........OK

Checking DCS.....................OK

Checking dispatcher 2............OK

Checking exposure meter..........OK

Checking HIRES settings:

Checking TV guider power.........WARNING! Current value 'off' should be 'on'

Checking Exp.meter power.........WARNING! Current value 'off' should be 'on'

Checking Enclosure lights........OK

Checking All 3 Doors.............OK

Checking ESTOP status............OK

Checking CCD temp setpoint.......OK

Checking system disable status...OK

Checking CCD temperature.........OK

Checking cryo lever position.....OK

Checking relative humidity.......OK

Checking autofill enable.........OK

Checking air pressure............OK

Checking current instrument......WARNING! Current value 'LRISADC' should be 'HIRES'

Checking XD brake status.........OK

Checking HIRES switches:

Checking camera focus cntrl......OK

Checking camera cover cntrl......OK

Checking collimator focus cntrl..OK

Checking collimator cntrl........OK

Checking decker cntrl............OK

Checking rotator cntrl...........OK

Checking echelle cntrl...........OK

Checking lamp filter cntrl.......OK

Checking filter 1 cntrl..........OK

Checking lamp select cntrl.......OK

Checking filter 2 cntrl..........OK

Checking slit cntrl..............OK

Checking hatch cntrl.............OK

Checking tv aperture cntrl.......OK

Checking iodine cell cntrl.......OK

Checking tv focus cntrl..........OK

Checking tv filt 1 cntrl.........OK

Checking tv filt 2 cntrl.........OK

Checking x-disp cntrl............OK

---------------------------------------------------------------

5 errors and 4 warnings were issued.

---------------------------------------------------------------

...