/
Project Rumsfeld: Addressing the known unknowns with the army you have

Project Rumsfeld: Addressing the known unknowns with the army you have

There are operations improvements which are not getting done. They’re not urgent enough to be classified as emergencies, but they’re not critical enough to get project level attention. Let’s collect examples of current situation and argue for resources dedicated to operations, so that these issues can be addressed over time.

The goal is to streamline operations and free up support personnel for other unexpected tasks (such as the sudden need to support pajama mode).

Title

Description

Impact

Effort

Critical aspects

Title

Description

Impact

Effort

Critical aspects

Instrument keyword layer

Bring instrument keyword layer to a common level. Because of the different age and different implementations, some instruments are lagging behind in critical functionalities.

  • Need to maintain different systems, some close to being impossible to update

  • Scripts are unreliable, and require frequent troubleshooting interventions, both day and night

  • Observations are impacted (lost time)

100 hours per instrument, with focus on 2/3 of the worst cases

DSI cannot function without updated keyword layers

UNO cannot be fully deployed

Keyword history cannot be deployed

Alarm handlers cannot be deployed

LRIS dependence on DEIMOS

Critical LRIS infrastructure (starlist generation) depends on DEIMOS software, this is a leftover from old implementation

  • Depending on the status of DEIMOS, we need to instruct observers to use a different set of scripts

  • DEIMOS upgrade cannot be finalized

50 hours, software + SA

It is holding back the DEIMOS upgrade and creating an unnecessary but hard dependency between two instruments

Pajama mode upgrades

Redesign the remote observing request form so that it handles pajama mode observing appropriately. We need to automatically ingest SSH keys from observers and distribute secure information (passwords, firewall info).

  • Increased support load on SA and (possibly) software group

  • Delays in other critical projects that SAs cannot afford to work on

300 hours, mostly software

Current support load can lead to burnout, decreased productivity, projects lagging behind, morale issues

Estimate on a per night bases of additional workload: 30 extra minutes a night?

Zoom meetings management for remote observing

Eliminate Zoom dropouts due to hitting the 24 hour meeting limit. These have been rare lately, but will begin happening more frequently as the nights get longer. They interrupt afternoon setup or early observing time depending on when they are triggered.

  • Interruptions in the observing procedures, either at night (with time lost) or in the afternoon.

  • Security: we currently have no practical way to impose password protection on night observing Zoom meetings and we are using the same meeting ID every night.

100 hours, software + SA

 

SAT upgrades

Replace SAT with modern interface, no IDL

  • Cost for IDL licences

  • Overhead due to inefficiencies

  • Support load due to spaghettification of the code

  • Lack of features (alignment on lris red for example)

  • Incompatible with DSI (no API)

  • Does not meet the standard to transfer support to scientific software

300 hours, software and SA

Can affect observing (blue side problems?)

Incompatible with DSI

Instrument alarms

Replace cron jobs with centralized, standard system

  • cronjob failures can lead to missed alarms

  • cronjob relies on local mail servers, big security issues

  • increased support load due to different implementations for different instruments

  • reaction to alarms is not defined, and depends on people reading emails and knowing what to do

  • many false alarms are sent leading to staff ignoring alarms

 

Incompatible with UNO

Risk of missing critical alarms

Keeping mail servers alive effectively contradicts the spirit of security of the transition to office365

VPN access

Remove the need to ssh and vnc into our internal servers

  • Unnecessary steps in performing frequent operations

  • Increased reaction time in critical troubleshooting at night

  • Security issues with running a firewall and tunneling connections

Probably a contract rather than internal work

Decreased productivity

H2RG detector efficiency

Improve H2RG detector readout software to fix current poor efficiency when using co-added frames with post-upgrade NIRSPEC observations (both SPEC and SCAM)

  • Coadd is not effiecient, reduced performance on sky

160 hours of software

Efficiency decrease for L and M band observations (since before the upgrade).

Adding Coadds to MAGIQ

Allow MAGIQ guider software to command multiple co-adds (in the case of SCAM guiding) to enable on-slit guiding of fainter targets and ANY target observed at thermal wavelengths.

  • Inability of not being able to expose SCAM (via MAGIQ) with multiple co-adds, objects are too faint in order have enough signal to guide on them, given the short exposure times required to stay under saturation level due to the high sky background.

80 hours of software

Can not SCAM guide in at L or M band wavelengths.

Ownership of QACITS

Take ownership and redo QACITS to improve efficiency of operations with the vortex. This would include supporting alad keywords on vm-nirc2.

  • Significant support load for the few SAs who know how to run this

  • Poor performance on sky, lost time, low efficiency

  • Misleading information about an observation mode that is offered but not facilitated

  • Decreased customer satisfaction

160 hours of software

120 hours of SA

Vortex coronagraphy is becoming a high demand mode and we are not ready to support it.

KCWI binning

The detector DSP code was never finished. This was an accepted lien in view of the fact that we were going to fix it for the red side, but it is no longer happening because we have abandoned leach controllers

  • Significant residual charge after switching binning, lasts about an hour, increases noise and decreases performance.

  • Observers don’t use the correct binning to avoid this effect

Contract to Caltech or to UCO

Critical features that is missing because of lack of time/funds

Dsimulator

The project to replace dsimulator with a web version was not finished

  • Critical software to design masks for DEIMOS is on life support. Most observers cannot install it. We keep an old Solaris machine alive just for this, and we need to allow observers to use it via a VNC connection

160 hours of software

100 hours of SA

If we cannot run the old IRAF installation, we can lose the capability of offering this service to our observers who, in most cases, don’t know how to run the software

DEIMOS upgrade to Linux

To finish the new DEIMOS control GUI, display tool and operation scripts.

  • Migrate to a modern user interface on a modern Linux system.

  • Upgrade to eliminate dependency on obsolete software/hardware.

40 hours of software

120 hours of SA

This new interface is the only option we have to operate DEIMOS with the upgraded rotator controller and computer.

Extreme risk of slipping completion of this work beyond the end of FY20 due to resources being relocated elsewhere.

Twilight observing installation on linux-bases OSIRIS hosts

We lost the capability of running this observing program after OSIRIS migration to Linux.

  • Increase science output of the observatory by taking advantage of the twilights.

8 hours of software

24 hours of SA

We are unable to support programs that have already been awarded time.

Instruments documentation

To facilitate the observers the preparation of their observations runs beforehand, which would minimize the amount of time SAs spend on pre-observing tasks.

Complete documentation on how to operate the instruments would cut SAs support hours during the night.

  • Increase the efficiency of the night time observations

  • Save SAs time on helping observers preparing their observations.

  • Increase the observer’s satisfaction with instruments documentation

250 h of SA

Cut significantly the time that SAs invest in support.

Update MOSFIRE MAGMA

Make it compatible with modern computers that observers are using

  • Eliminate known errors that observers run in to and we have to tell them to ignore

  • Eliminate workarounds for saving files.

  • Add design parameters to mask file, so that the MOSFIRE control software can incorporate design parameters such as dither space rather than depending on observers to manual transcribe that info from MAGMA to the control system.

  • Update the calibration tool to take MOSFIRE mask calibrations so that it is more resilient to errors. Currently, we have to tell each observer that they must watch all their calibrations get taken to ensure that the script does not hang (this can be several hours in some cases).

200 hours of software to re-write MAGMA or much less if we just update existing Java code.

Make MOSFIRE mask design DSI ready.

Eliminate friction on observers during observing nights.

SAT migration to Linux

To migrate the current version of the SAT to Linux on MOSFIRE and LRIS

  • The SAT runs faster on modern Linux machines than on the old Solaris machine (hanakowai)

24 hours of SA

4 hours of software

Eliminate dependency of critical night operations software on an obsolete Solaris machine located in the HQ.

ESI VNCs on Linux

Migrate ESI to a linux VM as the VNC server like the other instruments.

  • Modernize ESI GUIs

  • Bring ESI partially in to the linux world to match other instruments

  • Reduce dependency on old solaris machines

 

 

Related content

2020-07-30 Meeting notes
2020-07-30 Meeting notes
More like this
2023-06-22 Meeting notes
2023-06-22 Meeting notes
More like this
2020-04-14 Meeting notes
2020-04-14 Meeting notes
More like this
2021-03-25 Meeting notes
2021-03-25 Meeting notes
More like this
2021-06-10 Meeting notes
2021-06-10 Meeting notes
More like this
2021-02-25 Meeting notes
2021-02-25 Meeting notes
More like this