Troubleshooting Tips

Troubleshooting Tips

More detailed, step by step troubleshooting instructions can be found on the wiki page for WFCAM troubleshooting

MECHANICAL
WFCAM interlock failure
Hexapod stuck/star looks like a doughnut
Light leak in dark frames
Sequence console pops up error after WFS
COMPUTING/DATA ACQUISITION
Countdown freezes at zero: “nonfatal”
Countdown freezes at zero – “fatal – requires reboot”
Acquisition fails to write data file
Seq. Console: Failed to set configuration (phase 1)
Disk not mounted/ompobslog can’t see data
ARRAYS/ELECTRONICS
Arrays fail to enable/blank first frame
Readnoise is high
Dark Current is high (Esp. in Camera 4)
Channel edge problem
Arrays fail to disable
Rebooting SDSU controllers remotely
Running WFCAM with fewer than four camera
Temperature limits for safely operating WFCAM

DATA REDUCTION, APPEARANCE
DR exits on first frame
DR continually exits for camera 4
Read noise is high
Dark current is high (especially camera 4)
Light leak in dark frames
Channel edge problem
No suitable dark
DR can’t see the raw data
Effect of bright stars in the science frames

Mechanical

WFCAM interlock failure:
See, e.g., fault: 20061107.002. On a stuck countdown fault or if you have difficulties switching filters, quickly check the main WFCAM DM screen. If there is an interlock fault, the interlock status will be red on this screen. You must follow the procedures in the first response to 20061107.002 to get going again. Interlock faults occur when the WFCAM CCS loses the filter paddle position and forces you to park the filters, then redatum to refind the positions and prevent the paddles from colliding. 


Hexapod stuck/star looks like a doughnut:
A typical symptom of hexapod stuck fault is a guide star that looks like a doughnut due to being out of focus. First, toggle the telescope focus between 0 and 1 to see if the hexapod responds. If it does, try refocusing; if it doesn’t, slew to the zenith, reset the hexapod and retry. The canonical fault for this is 20051214.010, and please respond to this fault if you have an occurrence of this, but no time is lost. 

Light leak in dark frames: 
Since the filter paddles do not completely mask the arrays in the blank (dark) position, there is a light leak that affects dark frames when the ambient light is bright. The DR should report a light leak. To avoid this, the beginning of the night dark sequence should be started no earlier than five minutes before sunset. (If the darks are done early, the dome should be closed, and the dome lamps should be turned off). This fault was originally misunderstood (20050619.001), but the current fault with more details and an image can be found in 20070417.003

Sometimes this can be caused by the array running at a lower (if the heater fails) or higher temperature than the set values. See (20171006.003

Sequence console pops up error after WFS:
See 20080215.001. The sequence console pops up the error “Failed to set configuration (phase 1)”. This is due to the focus mechanism hitting the limit switch because the WFS focus offsets were left in, and WFCAM is now datumming the filters. While the filters are finding datum, open the wfcamMain.dl DM screen and put 0.0 into the focus offset field and wait for the datum process to finish. 

Arrays/Electronics

Arrays fail to enable/blank first frame: controller for camera 2 had a fault, which resulted in camera 2 not enabling the power on command sent by the TSS; however, this is now believed fixed (see 20050403.001). If you do see any of the arrays blank after being enabled (powered on), then disable the arrays, remove inst. (or  ocs_down), run down the low-level software, power cycle the controller for the problem array, and restart everything.  Take a quick frame using the read noise or flush sequence to check that the camera is now enabled. 

Read noise is high:
The DR will report high read noise, and if the read noise is so high that it’s clear there is an electronic problem. The first thing to try is a flush sequence, and then another read noise sequence. If the noise is lower but not nominal, repeat. If the noise is still very high, run down the observing software, power cycle the controllers (or the controller for the problem array), and restart. Please see the wiki page on WFCAM remote power switch for details about power cycling the camera controller. If the read noise is still high after two power cycles, please call the night support astronomer.

Dark current is high (especially camera 4):
The DR reports a dark current, although it will not exit if the current is high. Note that camera 4’s DR always reports large dark currents for long exposures (typically in the hundreds). These numbers can be ignored. It is due to a read-out issue that puts a bias level into the array. This is steady and is removed in dark subtraction. See 20061116.004.

Channel edge problem:
Channel edge problem used to appear in two flavors: (1) “hard” channel edge problem: The edges of the channels appear in one bright and a next dark row or column, where the counts are higher and lower respectively from the rest of the rows or columns in the channel by a few hundreds or thousands of ADUs. The frame is totally useless when it happens. This was seen mostly in camera3 and sometimes in camera 4; (2) “weak” channel edge problem: a few rows or columns at the end of a channel taken with NDR waveforms will be brighter than the rest by a few or a few tens of ADUs. The number of bright rows or columns will increase with the increase in NDR exposure time. This was seen most pronounced in camera2 but was present, but weak, in other cameras. With the improvements implemented in late 2009 and early 2010, both these problems are removed. If you see any of these happening, please report it. The canonical fault is 20070118.002

Arrays fail to disable:
See 20070429.004 (original),  20070809.001 (canonical fault) 20100812.001 (the latest). Occasionally the arrays do not disable at the end of the night or the end of daytime checks. The obvious symptoms of this are 1) the arrays remain powered on in the sequence console and 2) the TSS does not get the powered off response in the wfcamControl log (please do watch for this message when disabling the arrays). If this problem occurs, you should remove inst. Run down the low-level software, restart the software, add inst. and enable and then disable the arrays. Sometimes this is not enough, in which case a full ocs_down is needed. 

Rebooting or turning on/off the SDSU controllers remotely:
This is to turn off or power cycle a WFCAM SDSU controller remotely and should be done only by a person trained in operating WFCAM like a Support Astronomer, TSS, or engineering personnel. See WFCAM remote power switch on the wiki troubleshooting page for details. 

While operating WFCAM remotely, if you find that the array is in a state and you cannot revive it using the usual methods (like running WFCAM including the low-level software down/up; rebooting drama/rtai if these are stopped), it can be revived or at least disabled using the remote power switch. 

Rebooting an SDSU controller remotely should be done only as a last step – like
  (a) the usual ways of recovery do not work and
  (b) there is nobody at the summit who can do it, and nobody will be going up very soon unless the weather is deteriorating and it is not good to leave the array/s powered.

If you end up power cycling the SDSU controller remotely, make sure to file a fault report (or reply to an existing fault report about it). 

Running WFCAM with fewer than four cameras:
See the link: https://about.ifa.hawaii.edu/ukirt/running-with-broken-cameras/

Temperature limits for safely operating WFCAM:
See the link: WFCAM safe operations 

Computing/Data Acquisition

Countdown freezes at zero: “nonfatal”:
The canonical fault is 20100713.001. For essentially all zero countdown faults you must remove inst and run down the low level software (it is unlikely you will be able to disable the array). When the software is run down, check the camera status. Do a nuke if necessary (and there is no harm in doing one anyway when the software is run down). If everything is OK, restart the low level software and again check status. If everything is running, add inst, enable the arrays and restart observations.For easier reference, follow the flowchart.

Countdown freezes at zero due to machine failure: message involving “integer”:
In this case the stuck countdown is a symptom of a wfacq machine having died. You will need to reboot whichever one is involved. 

For rebooting a wfacq machine, first try soft reboot via kvm switch as given below:

First type kvm, and double click on irtkvm2 in the left menu bar. You will need to then log in. Refer to the troubleshooting wiki page for the login details. 
  connect to the machine in question (double click) [see the note regarding machine names below] 

A blank screen will be launched. Press the enter key to get a log in prompt. Log in as root and type ‘/sbin/shutdown -r now’ 

Note: Once you log in to the kvm, the wfacq machines are listed by their ‘wfunit_hex address’ similar to ‘wfunit_??’ (four of them are are on irtkvm and wfunit_fc and wfunit_b8 are on irtkvm2. It is the ‘wfunit_hex address’ that you need to select in kvm for rebooting. 

These addresses will correspond to the 6 wfacq machines (4 in use and 2 standby). To know what `wfunit_hex address’ correspond to which wfacq machine, run the command “wfacqmachinetable” from a terminal on ohi or wfdisp2. You will get a table with 6 lines, one line for each wfacq machine. On each line, there will be a string similar to “??:??:??:??:??:??:??”, where “??” at the end is the ‘hex address’. The corresponding “wfacq” number also will be listed in the same line. So, for that machine, the entry in the kvm will be “wfunit_??”. 

If a soft boot doesn’t bring it back (if you don’t get a response on the kvm switch) try a hard reboot:

“connect wfunit_hexaddress power” (eg: connect wfunit_?? power), (give the admin’s password when prompted for) and reboot using the command “/boot wfunit_hexaddress”. 

After the reboot you must remount the raw data disk using “/local/bin/remount wfcamN”. Do this from your own account using your own password and make sure the DR for the relevant camera has been stopped. If the remount sticks, hit ctrl-c and it should continue. To make sure, remount the disk a couple of times. This will remount the raw data disks on all Linux computers.


Acquisition fails to write data file:
Occasionally WFCAM fails to write a raw data frame. This usually goes unnoticed since we are using the -skip option in the DR, but the DR might complain about the mosaic. Unfortunately there’s little that can realistically be done about this at the telescope. See 20060429.002.

Disk not mounted/ompobslog can’t see data:
If the DR does not see the raw data and/or ompobslog doesn’t see the data then the relevant raw data disk needs remounting (if ompobslog doesn’t see data, then the problem disk is on WFACQ1). Stop the DR, and from your own account on any summit machine run “/local/bin/remount wfcamN” where N is the machine number. Use your own password when prompted. If the remount sticks, hit ctrl-c to continue. Redo the remount if necessary and then restart the DR and /or ompobslog.

Failed to set configuration (phase 1):
See Sequence console pops up error after WFS and fault 20080215.001.

Data Reduction

DR exits on first frame:
Typically due to an array or all of them being disabled. If the arrays were enabled earlier but one is blank, the relevant controller has to be power cycled (see Arrays fail to enable/blank first frame). If the arrays were not enabled, then stop the sequence, enable the arrays, restart both the sequence and the DR.

DR continually exits for camera 4:

If the DR continually exits on camera 4, it is most likely due to short exposures in the Z and Y bands that do not provide enough sky counts to create a good sky calibration file. The data should still be fine. The different gain in camera 4 makes it so that this issue is only seen there.

You may continue observations as normal and simply restart the DR at the latest group number. Or if it is annoying/taking time, you may also power cycle camera 4 to address it. Below is an image of the error message received from the wfcamDR error log.



Read noise is high:
Generally flush the array and redo the read noise sequence. If this doesn’t help, the controllers may need to be power cycled; see the instructions in the previous section on Read noise is high. If the read noise is unusually high, the DR will exit as it’s telling you there is a problem that needs fixing. In that case, please call for help if power cycling the controller does not resolve the situation.

Dark current is high (especially camera 4):
See Dark current is high (especially camera 4)In just about all cases this can be ignored, especially if it’s in camera 4 as the DR provides an incorrect dark current value in the hundreds. 

Light leak in dark frames:
The DR will complain if it detects light leak during initial darks (see Light leak in dark frames). If this occurs, wait until 5 minutes before sunset before starting the darks sequence.

Channel edge problem:
The DR will report channel edge problems but in general will not exit. There is little that can be done at the moment (a software fix is being looked at) apart from marking the bad frames in the night log. Restart the DR from the start of the next group. 

No suitable dark:
The DR will exit if it cannot find a suitable dark for the current observation. This is to warn you that the data you are taking cannot be calibrated. You must make sure that a correct dark is taken, in fact it is recommended to take at least 5 of them. They must have the same exposure time and number of coadds. The DR will exit for the same reason if the darks sequence was not taken at the start of the night or in daytime checks. (Note: At present, the DR at UKIRT will not crash if darks do not have the same number of coadds as that of the science frames. Hence special care must be taken to have darks obtained with the same number of coadds as the science frames. CASU will use these darks when they reduce the data).

DR can’t see the raw data:
See Disk not mounted/ompobslog can’t see data. If the DR just sits there producing dots and not reducing the raw data that you know have been taken, then it is likely the raw data disk is not mounted. Stop the DR, remount the disk using “/local/bin/remount wfcamN” and your own password. If the remounting sticks, use ctrl-c to make it continue. Run the remount script twice if necessary and then restart the DR.

Effect of bright stars in the science frames