Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 22 Next »

There are operations improvements which are not getting done. They’re not urgent enough to be classified as emergencies, but they’re not critical enough to get project level attention. Let’s collect examples of current situation and argue for resources dedicated to operations, so that these issues can be addressed over time.

The goal is to streamline operations and free up support personnel for other unexpected tasks (such as the sudden need to support pajama mode).

Template

  • Idea/Title

    • Why important? What is benefit?

    • WAG on effort required

    • What sort of effort is required.

Title

Description

Impact

Effort

Critical aspects

Instrument keyword layer

Bring instrument keyword layer to a common level. Because of the different age and different implementations, some instruments are lagging behind in critical functionalities.

  • Need to maintain different systems, some close to being impossible to update

  • Scripts are unreliable, and require frequent troubleshooting interventions, both day and night

  • Observations are impacted (lost time)

100 hours per instrument, with focus on 2/3 of the worst cases

DSI cannot function without updated keyword layers

UNO cannot be fully deployed

Keyword history cannot be deployed

Alarm handlers cannot be deployed

LRIS dependence on DEIMOS

Critical LRIS infrastructure (starlist generation) depends on DEIMOS software, this is a leftover from old implementation

  • Depending on the status of DEIMOS, we need to instruct observers to use a different set of scripts

  • DEIMOS upgrade cannot be finalized

50 hours, software + SA

It is holding back the DEIMOS upgrade and creating an unnecessary but hard dependency between two instruments

Pajama mode upgrades

Redesign the remote observing request form so that it handles pajama mode observing appropriately. We need to automatically ingest SSH keys from observers and distribute secure information (passwords, firewall info).

  • Increased support load on SA and (possibly) software group

  • Delays in other critical projects that SAs cannot afford to work on

300 hours, mostly software

Current support load can lead to burnout, decreased productivity, projects lagging behind, morale issues

Zoom meetings management for remote observing

Eliminate Zoom dropouts due to hitting the 24 hour meeting limit. These have been rare lately, but will begin happening more frequently as the nights get longer. They interrupt afternoon setup or early observing time depending on when they are triggered.

  • Interruptions in the observing procedures, either at night (with time lost) or in the afternoon

100 hours, software + SA

SAT upgrades

Replace SAT with modern interface, no IDL

  • Cost for IDL licences

  • Overhead due to inefficiencies

  • Support load due to spaghettification of the code

  • Lack of features (alignment on lris red for example)

  • Incompatible with DSI (no API)

  • Does not meet the standard to transfer support to scientific software

300 hours, software and SA

Can affect observing (blue side problems?)

Incompatible with DSI

Instrument alarms

Replace cron jobs with centralized, standard system

  • cronjob failures can lead to missed alarms

  • cronjob relies on local mail servers, big security issues

  • increased support load due to different implementations for different instruments

  • reaction to alarms is not defined, and depends on people reading emails and knowing what to do

Incompatible with UNO

Risk of missing critical alarms

Keeping mail servers alive effectively contradicts the spirit of security of the transition to office365

VPN access

Remove the need to ssh and vnc into our internal servers

  • Unnecessary steps in performing frequent operations

  • Increased reaction time in critical troubleshooting at night

  • Security issues with running a firewall and tunneling connections

Probably a contract rather than internal work

Decreased productivity

H2RG detector efficiency

Improve H2RG detector readout software to fix current poor efficiency when using co-added frames with post-upgrade NIRSPEC observations (both SPEC and SCAM)

  • Coadd is not effiecient, reduced performance on sky

Adding Coadds to MAGIQ

Allow MAGIQ guider software to command multiple co-adds (in the case of SCAM guiding) to enable on-slit guiding of fainter targets and ANY target observed at thermal wavelengths.

Ownership of QACITS

Take ownership and redo QACITS to improve efficiency of operations with the vortex. This would include supporting alad keywords on vm-nirc2.

  • Significant support load for the few SAs who know how to run this

  • Poor performance on sky, lost time

  • Misleading information about an observation mode that is offered but not facilitized

  • Decreased customer satisfaction

KCWI binning

The detector DSP code was never finished. This was an accepted lien in view of the fact that we were going to fix it for the red side, but it is no longer happening because we have abandoned leach controllers

  • Significant residual charge after switching binning, lasts about an hour, increases noise and decreases performance.

  • Observers don’t use the correct binning to avoid this effect

Contract to Caltech or to UCO

Critical features that is missing because of lack of time/funds

Dsimulator

The project to replace dsimulator with a web version was not finished

  • Critical software to design masks for DEIMOS is on life support. Most observers cannot install it. We keep an old Solaris machine alive just for this, and we need to allow observers to use it via a VNC connection

160 hours of software

100 hours of SA

If we cannot run the old IRAF installation, we can lose the capability of offering this service to our observers who, in most cases, don’t know how to run the software

DEIMOS upgrade to Linux

include Gui control and operation scripts

Twilight observing installation on linux-bases OSIRIS hosts

Instruments documentation

Example Projects Which Require Outside Effort

  1. Bring all instrument keyword layers up to a common level of quality.

    1. This comes up for every project that touches instruments, but is always left off of the task list because it is large and each project declares it to be out of scope. As a result, we’re left with unreliable software which can not be modernized. Instrument scripts are brittle and break easily. SA and observer time is wasted doing minor troubleshooting and retries during the night. This costs both personnel effort and accumulates lost observing time without actually triggering a night log ticket because it is just the way the instrument works.

    2. WAG: 100 hours per instrument. This varies enormously between instruments, so some will a factor of several more and some will be a factor of several less.

    3. Needs software effort at keyword service level.

  2. Remove LRIS dependance on DEIMOS

    1. Many of our instruments have their suite of scripts built from a wholesale copy of the DEIMOS scripts which are then modified. LRIS is particularly bad in this regard as a number of scripts directly call resources on DEIMOS computers, so we can not work on DEIMOS without affecting LRIS.

    2. WAG: 50 hours

    3. Effort from SAs with some small software support.

  3. Automate/streamline pajama mode observing setup.

    1. Redesign the remote observing request form so that it handles pajama mode observing appropriately. We need to automatically ingest SSH keys from observers and distribute secure information (passwords, firewall info). This will significantly reduce the observing support burden the SAs are currently carrying and will free them up to help with other projects.

    2. WAG: 300 hours (this touches a lot of subsystems)

    3. Effort from software.

  4. Manage nightly Zoom meetings to avoid 24 hour limitation which causes Zoom dropouts, especially in the Fall.

    1. Eliminate Zoom dropouts due to hitting the 24 hour meeting limit. These have been rare lately, but will begin happening more frequently as the nights get longer. They interrupt afternoon setup or early observing time depending on when they are triggered.

    2. WAG: 100 hours

    3. Effort from software and SAs.

  5. Replace the SAT with a well designed, modern tool which automates much of the mask alignment process.

    1. This would be a modernization of the SAT interface to emphasize speed and adaptability to conditions and make it more sensitive to faint stars. It lays important groundwork for DSI.

    2. WAG: 300 hours

    3. Effort from software and SAs.

  6. Replace the cron based instrument alarms system with one which provides accurate information which is easily accessible.

  7. Create a VPN access to Keck firewall to be able to work from home as if we were in the office.

  8. Improve H2RG detector readout software to fix current poor efficiency when using co-added frames with post-upgrade NIRSPEC observations (both SPEC and SCAM)

  9. Allow MAGIQ guider software to command multiple co-adds (in the case of SCAM guiding) to enable on-slit guiding of fainter targets and ANY target observed at thermal wavelengths.

  10. Take ownership and redo QACITS to improve efficiency of operations with the vortex. This would include supporting alad keywords on vm-nirc2.

  11. KCWI binning switches are poorly implemented.

  12. Replace dsimulator with the web version.

  13. Finish the DEIMOS upgrade to Linux, including the control GUI and operation scripts.

  14. Install the Solar System twilight observing scripts on the upgraded Linux-based OSIRIS hosts.

Projects Which We Can Do Inside the Group

  • Complete instrument’s documentation to allow easier observation designs for observers. Keep the instruments' documentation up to date to allow easier SA training.

  • No labels