Vous êtes ici : Accueil / TECHNICAL PAGES / Modes d'emploi / Reboot et Shutdown / COMPUTER SYSTEM INITIALIZATION

COMPUTER SYSTEM INITIALIZATION

Guide for starting station computers after a power outage

 

 



INTRODUCTION

The station's computers are powered by a UPS that instantly takes over in the event of a power outage. However, the capacity of the UPS batteries is limited and when the UPS reaches a critical threshold (when the UPS batteries are almost empty), a command is sent to shut down the station's computers properly (shutdown).

Thus, automatically (or manually) the computers of the station can be shut down by means of a script (T_shutdown_euler) run on glslogin1.

Anyone may need to initiate this command in case a power outage is planned for several hours (outside the UPS capabilities).

The power supply from the UPS can be switched on or off remotely via Internet and individually for each computer through an IP-Power.

There are 3 IP-Power located in:

  • the Server cabinet (10.10.132.91; L105)
  • the LCU cabinet (10.10.132.92; L105)
  • the REM cabinet (10.10.132.94; L201, under the telescope).

 

ups-ippower.png

Once a machine is shut down by T_shutdown_euler the port of the IP-Power that powers it is normally turned off but not always!

Be aware that the last machines to be shut down cannot cut their own IP-Power port. Also if the shutdown script had a problem the Euler shutdown may be incomplete.

If the UPS exhausts its batteries, then all ports of all IP-Power are Off and stay off even if the power comes back on.

 

Method to start a computer through an IP-Power

 

  • If the IP-Power's port has been off for more than 30[s], simply turn on the port
  • If the port is On, you have to execute a Reset sequence.: OFF..wait 30[s]..ON on the IP-Power port because the computers BIOS is programmed to boot up when the power comes back on. If the port is ON it is necessary to go through the OFF phase, but before that, it is absolutely necessary to make sure that the computer associated to the port has received its shutdown command, because otherwise if a machine is still running at the time of the OFF, the effect is identical to a sudden power cut and there is the risk of losing the hard disk of the machine.

 

 



BASIC TESTS ALLOWING TO KNOW THE STATE OF THE STATION

 

If you are faced with a station whose computers are turned off, you must analyse and understand the situation in order to act in the appropriate manner.

Attention, it is imperative to follow the starting order (see below).

The most suitable way to restart the computers of the station is to be connected to Euler's network either locally with your laptop (using wifi) or remotely on a Linux machine over VPN. However if you are in the station you have physical access to the IP-Powers, allowing you to view the status of the ports and perform unsafe operations directly on the ports.

 


Quick Check

ssh glslogin1.ls.eso.org (or glslogin2) 

If the station responds correctly and your data are presents, the router is working, its IP-Power is on, the disk server is working, and of course the station is working.

An additional test on this machine allows you to know the status of the other machines in the network, just type:

T_show_date

this command shows the machines switched off or not. If all of them respond, we can assume that the computer system is working. Otherwise we will continue this documentation.

 


Normal Check

The easiest way is to check everything from the beginning:

ping 134.171.80.170 (if no answer: Router down, No electricity in the station -> STOP -> get La Silla help)
ping 10.10.132.91   (if no answer: VPN or the IP-Power01 is down)
ping 10.10.132.100  (if no answer: VPN or the Disk Server SynologyCluster is down) 
ping 10.10.132.81   (if no answer: VPN or glslogin1 is down; test .82 for glslogin2)

If the router do not answers that means there is neither electricity nor internet in the station. You can check the dates of lastest informations of the meteo monitor or of the danish All Sky to know if there's electricity on the site. If there's no electricity, there's nothing to do. If it seems that only the Euler station is without power, check the main breaker. See this document. From Geneva, contact ESO support there is nothing more to be done.

If the router answers and the three URL (10.10.132.xxx) do not answer, from Geneva you have to restart the local router.

  • Manually, switch off, wait 10 seconds, switch on, wait for a status ready (2-3 minutes).
  • By software: go on https://10.10.133.1 (ElectroLab) log under cisco passwd=N.... -> System_Management -> Restart . Wait the URL answer (be sure you write https:// because http:// will not result).
  • Outside of the observatory: VPN Observatoire -> session x2go on 129.194.67.159 (gvanuc01)-> follow the previous tips.

If the router answer and only the IP-Power answers, this means that the Euler computer system works but not (or only partially) the computers.

When the situation is corrected (an Euler router and if needed a VPN working) check again the connection on the IP-Power.

The IP-Power must answer for a remote control, thus repeat:

ping 10.10.132.91

If the IP-Power do not answer you need a local help (see contact) because there is nothing more to be done. Stop here.

 



CHECK OF THE IP-POWER

 

You have to know the status of each port of the 3 IP-Power in order to determine how to restart the station (note that glsippower03 is decommissioned).

There are two ways to read the status, using the command line (simpler) or with a browser:

 


1) You are on the Unix station with an active VPN to Euler

In a terminal, type:

T_ippower_stat

You get for example:

gvanuc01:~> T_ippower_stat
Acces to IPpower on glsippower01 (10.10.132.91)
Acces to IPpower on glsippower02 (10.10.132.92)
Acces to IPpower on glsippower04 (10.10.132.94)

Free ports status, normally Off:

free_91_7          on glsippower01 port=7  Off
free_91_8          on glsippower01 port=8  Off
free_94_4          on glsippower04 port=4  Off
free_94_5          on glsippower04 port=5  Off
free_94_6          on glsippower04 port=6  Off
free_94_7          on glsippower04 port=7  Off
free_94_8          on glsippower04 port=8  Off

Named ports status, Host must be On:

Lumiere-Dome       on glsippower04 port=3   Off
glsaux-drs-moni    on glsippower02 port=6        On
glscora            on glsippower02 port=1   Off
glsecam            on glsippower02 port=4   Off
glslogin1          on glsippower01 port=5        On
glslogin2          on glsippower01 port=6        On
glspc20-imagcor    on glsippower02 port=2   Off
glsserv            on glsippower02 port=3        On
glstopt            on glsippower04 port=2   Off
glstreg            on glsippower04 port=1   Off
pctwincat2         on glsippower02 port=7   Off
pctwincat3         on glsippower02 port=8        On
synology03-1       on glsippower01 port=1        On
synology03-2       on glsippower01 port=2        On
synology04-1       on glsippower01 port=3        On
synology04-2       on glsippower01 port=4        On

 


2) your have a navigator with access to the Euler network (Local+laptop or VPN)

 

Go successively on 10.10.132.91, 10.10.132.92, 10.10.132.94 and watch the informations:

Ip_power_login.png

In the following image you can see that the disk server as well as glslogin1 and glslogin2 are not stopped

ip_power_example.png

 



STARTUP OPERATIONS

This list shows which computers have power and depending on the status of the ports we have two differents ways of proceeding:

  1. If the shutdown has been long and the UPS has discharged its batteries, all ports will be Off. This situation is the simplest because the start-up will be done according to a simple procedure, by switching on the computers in the right sequence. There is no need to read the flowchart, just execute the instructions in the following chapters.
  2. If there's anything left on On, follow this flowchart:

     

    Note for the flowchart: T_ippower* commands can only be launched from a remote station in a terminal. On the spot you have to use a browser from a laptop or directly use the IP-Power buttons. Once a worskstation (glslogin1 or glslogin2) is accessible, the T_ippower* commands are accessible.

     


    Flowchart

    Check_After_Power_cut.png

     



    STEP 1: START THE DISK SERVER

    It's the first operation. Computers cannot operate if the disk server is not operational.

    The next 2 sections show:

    1. how to start the disk server
    2. how to test if the disk server is operational

     

     


    Try WakeOnLan (Booting the Disk Server) (according to the above flow chart)

     

    On 10.10.132.91 click the 4 ON buttons on Synology (disk server)03-1,03-2,04-1,04-2

    This action will power them but not start them up. If you are in La Silla, you have to click the ON button of one of the two Synology. This action will start the second one after a while.

    If you are not at La Silla, but with a VPN access in Geneva (gvanuc01:@labo, gvanuc02:@astrodome), try to use the WakeOnLan method (with little chance of success). The arguments are the router's IP at La Silla and the Synology Server's Mac address, try only one.


    pour synology_105 (10.10.132.105)
    perl wakeonlan -i 10.10.132.127 00:11:32:62:97:FC

    pour synology_106 (10.10.132.106)
    perl wakeonlan -i 10.10.132.127 00:11:32:5B:E1:58

    pour synology_107 (10.10.132.107) (spare, switch off)
    perl wakeonlan -i 10.10.132.127 00:11:32:5B:62:D4

    At this point you should test if the disk server boots with the method described just above. If the Synology does not start with WakeOnLan, contact the ESO support to manually start the SynologyCluster.

    Note: locally with your own computer and perl installed, you can use this wakeonlan.pl script. WakeOnLan also exists for Windows (Google it)

     


    Navigator on:10.10.132.101:5000 (Check Disk Server) (according to the above flow chart)

     

    The server disk must be active before continuing.
    On a browser test the URL 10.10.132.101:5000 as long as the following page does not appear it is useless to go any further.
    Note that only the page must appear, there is no need to log in.
    Synology_login.png
     


    STEP 2: START THE MAIN WORKSTATION (with the disk server active)

     

    The main workstation is the DHCP server. Note it is the DHCP server that assigns the fixed IPs and names to the machines, it must be started before all the other computers start up.

    On 10.10.132.91 click the ON buttons of glslogin1

    Note: if glslogin1 doesn't work, glslogin2 can take the role of main Workstation because it has the secondary DHCP server. So in the rest of this documentation use the name glslogin2 and its IP 10.10.132.82

     


    Check the Boot Completion of the Main Workstation glslogin1

     

    At La Silla wait for le login window, in Geneva simply repeat a ping on 10.10.132.81 and after that ssh. Example:

    ping 10.10.132.81.  ... <CTRL>-C
    ssh 10.10.132.81    ... <CTRL>-C
    

    in any case, make a login to ensure the presence of your data. This test indicates that the disk server is working and that we can continue. From another machine (your computer) try the login with "glslogin1" which will indicate that the DHCP server is running on glslogin1

     



    STEP 3: START THE COMPUTERS (with the main workstation and the disk server actives)

     

    The computer system is based on the disk server and the main workstation (DHCP server). These 2 elements must be operational before completing the complete start up.

    As described above, starting a computer powered by an IP-Power is done through a sequence of Reset OFF..wait 30[s]..ON. This sequence should only be performed if the computer has been shut down properly by a shutdown.

    If the port is OFF, just turn it ON with the navigator. After that the computer will reboot.

    If the port is ON, make sure the computer is turned off. The T_ippower_reset <host> command performs the reset only if the computer does not respond to the ping. Another solution is to ping the computer (see annexes at the end of this document) and in case of no response do a Reset on the browser.

    If the computer responds to the ping, we don't do the reset procedure. In this case, considering that the disk server and the main workstation have been rebooted, it is preferable (and faster) to reboot the computer concerned with T_reboot_servers <workstation> or T_reboot_lcu <lcu>

     


    Start workstations

     

    glslogin2 is on 10.10.132.91

    glsaux-drs-moni are all on the same port on 10.10.132.92.

    After this, glsmonitor starts (its screen is behind the glass), the monitoring application window should be brought to full size with the mouse. Other workstations should display the login window.

     


    Start Local Control Units (LCUs)

     

    Instrument LCUs are on 10.10.132.92 (gls*)

    Telescope LCU are on 10.10.132.94 (glstreg and glstopt)

    The screens behind the glass will activate.

    After a while type the following command on glslogin1. It will display the current date on all LCUs, showing that everything works.

    T_show_date


    Start the 2 PC Windows for Beckhoff development

     

    The both PC are not used for observation. This operation is facultative.

    On 10.10.132.92 click the ON buttons of pctwincat2 et pctwincat3

     

    That's all folks

     



    ANNEXE: how ping a computer

     

    Do the following commands, and interrupt them with a <CTRL>-C. Example with glspc20

    ping 10.10.132.40 ... <CTRL>-C
    ssh 10.10.132.40  ... <CTRL>-C

     



    ANNEXE:  IP addresses for ping

     

    Disk server:  ping 10.10.132.101
    
    glslogin1  :  ping 10.10.132.81
    glslogin2  :  ping 10.10.132.82
    
    glsmonitor :  ping 10.10.132.35
    glsdrs     :  ping 10.10.132.36
    glsaux     :  ping 10.10.132.38
    
    glspc20    :  ping 10.10.132.40
    glsecam    :  ping 10.10.132.47
    glscora    :  ping 10.10.132.49
    glstopt    :  ping 10.10.132.50
    glstreg    :  ping 10.10.132.51
    glsserv    :  ping 10.10.132.54
    

     



    LW 25/05/2020, 17/01/2022