MikroTik: http-ping / Netwatch for HTTP

My two Raspberry Pis each have NAT rules on my MikroTik. I only have a single external IP address, so obviously I can’t have two NAT rules that forward the same port to each Pi and are otherwise identical. I do have different ports for each Pi as alternative / backup ports, but I prefer to primarily use a single port – and a well-known one at that.

I use the MikroTik’s / RouterOS’ scripting engine as method to failover between Pis. When it detects the primary Pi is down, it rewrites the relevant NAT rules to forward traffic to the secondary Pi. When it’s back up, it points the traffic back to the primary Pi.

Now in other cases I run a lot of the scripting logic in Linux which then connects to the MikroTik using my mikrotik-ssh command. However in this case we’re dealing with worries that my Raspberry Pis are down, so obviously we can’t rely on the Linux they’re running! Rather than introduce a third device to control the MikroTik, it makes sense to script this entirely on the MikroTik itself. This requires us to use use RouterOS’ scripting language, which isn’t the best, but can do what we need.

I previously used RouterOS’ Netwatch utility for determining whether a Pi is up or not. However a big disadvantage is that this purely detects based on pinging and receiving replies – so it knows that Layers 1 through 3 are working fine, but not 4 upwards. We really want to know if our application itself (Layer 7) is still working. In fact sometimes everything on Layer 7 has been broken and I’ve been none the wiser! In particular one of my SD cards seems to have a contact issue which occasionally causes all applications on that Pi to lock up…but it still happily sends out ICMP replies. So Netwatch doesn’t detect any issues and doesn’t alert me, and I don’t notice because almost every service is redundant between both Pis anyway.

So I want to test that Layer 7 is working, which in this case for me is a web server. Although even that warrants a bit of discussion as Layer 7 itself can be further layered. In LAMP, for example, do you just need to check Linux and Apache? Or Linux, Apache, and PHP? Or all the way to MySql? As a rough rule, each one gets progressively harder to test.

For my scenario, Apache being up and serving web pages means with almost 100% certainty that everything is working fine, so it’s sufficient to host a static text file and make sure that’s available over HTTP without error. And it’s relatively simple to script this entirely on the MikroTik.

So overall, Netwatch-HTTP.rsc works like this:

  • The scheduler is set to run the script every 10 minutes.
    You could probably happily set this much lower, maybe even every 20 seconds if you really wanted.
  • The script fetches a text file from each Pi over HTTP.
    If this fails then obviously the Pi is down; in layers 1 through 6 and part of 7. If it succeeds then we still have another check.
  • If it fetches the file successfully, it confirms that the contents of the file are identical to the hostname of the Pi.
    I’ve made the file a simple text file containing the hostname, which is mostly arbitrary. If you’re just testing your web server you would be fine with whatever static content you like, I just felt like different content for each host, and the hostname seemed to make the most sense for that. If you were testing PHP you may want to test something more dynamic. But remember we’re limited by RouterOS’ scripting capabilities here.
  • It then checks whether this overall “state” (which Pis are up and which are down) is different from the last time this script ran.
    This is to mimic Netwatch’s behaviour in that, when it detects a device is down it executes the down script exactly once, and when it’s back up it executes the up script exactly once. With our script running every 10 minutes, we don’t really want to repeat the same tasks every 10 minutes when nothing has changed, especially if we’re sending out an email alert.
  • If the state is the same, it does nothing. If the state has changed, well at this point you do whatever it is you want to do to manage the device going down / coming back up.
    In my case I alter my NAT rules.

So first we need to define a few functions, beginning with the UpOrDown function. Basically, you give this the details of your web server, and it will spit out true if it’s up and working, and false otherwise.

# Function: UpOrDown
#   Returns true or false based on whether the pi is up.
#   It does this by checking a file on the pi's web server for its hostname as the output.  i.e.
#       Successful connection and content matches:        true
#       Successful connection and content does not match: false
#       Failed connection :                               false
#   Parameters
#       piname        The name of the pi
#       ip            The IP of the pi
:local UpOrDown do={
    
    :local host "mywebserver.lab"
    :local srcpath "/netwatch.txt"
    
    # Assume the pi is down until we prove otherwise.
    :local piup false
    
    # The filename we will be temporarily saving the output to.
    # The subdirectory "updown" is created if it doesn't already exist.
    :local filename ("updown/" . $piname . ".txt")
    
    :do {
        
        /tool fetch address=$ip host=$host mode=http src-path=$srcpath dst-path=$filename
        
        # For good measure, to ensure the file is written.
        :delay 2
        
        # Get all of the file's contents.
        :local filecontent [/file get [/file find name=$filename] contents]
        
        # And confirm if it matches the name of the pi
        :if ($filecontent = $piname) do={
            :set piup true
        }

    } on-error={
        # Display that there was an error.
        :put ($piname . " error")
    }
    
    # Display whether it's up or not.
    :if ($piup) do={
        :put ($piname . " is up")
    } else={
        :put ($piname . " is down")
    }
    
    # Delete the file
    /file remove [/file find name=$filename]
    
    # Return true or false
    :return $piup
}

A few points that may not be obvious from the comments:

  • You call the function, for example, as follows: $UpOrDown piname="Pi1" ip=192.168.123.101
  • Any :put statements are just to display to the console for diagnostics if running manually. They don’t affect the script.
  • With the fetch command, you can specify either a URL (e.g. http://mywebserver.lab/netwatch.txt) or an address, host and src-path. The reason you may want to do the latter as I have done is if your web server is using virtual hosts, and you need to make sure that it downloads from the right one. In my case I do need it to pick up from a specific virtual host, particularly as due to the failover setup both Pis have the same virtual hostname. If the MikroTik picked this up from DNS using this single virtual hostname, either you have one record in DNS and so would just be able to connect to one Pi, or you have two records with DNS round-robin and it would be pot-luck which one you connect to. So I specify the IP and virtual hostname.
  • We have to save the content to a file and read it before comparing; fetch can’t save it to a variable.

Now earlier I mentioned we need to determine if the “state” has changed. I chose to save this state in a text file on the MikroTik. This way it will definitely survive a reboot. The next two functions handle the saving and loading of this state.

# Function: LoadLastState
#   Returns the state as saved to the file.
:local LoadLastState do={
    :local state
    :do {
        :local StateFilename "updown/state.txt"
        :set state [/file get $StateFilename contents]
    } on-error={
        :set state -1
    }
    
    #:put ("LoadLastState: " . $state)
    :return $state
}

# Function: SaveLastState
#   Saves the state to the file.
#   Parameters
#       state     The state - 0, 1, 2, 3
:local SaveLastState do={
    :local StateFilename "updown/state.txt"
    # This line is required as a workaround to prevent an error if the file does not yet exist.
    /file print file=$StateFilename
    # Save the file
    /file set $StateFilename contents=$state
}

Now we can begin. First, let’s define some variables with the details of our Pis, and load the previous state from the file into memory.

:local pi1name "rpi1"
:local pi1ip 192.168.123.101

:local pi2name "rpi2"
:local pi2ip 192.168.123.102

:local LastState [$LoadLastState]

(Side note: Normally I would try to generalise this sort of script so it can work for any number of Pis, rather than exactly 2. However MikroTik’s scripting language is a bit tricky, so I felt it significantly easier and cleaner just to hardcode everything as two Pis. It wouldn’t be a lot of work to upscale by a few more).

Then we check which RPis are up – and save the results to variables.

:local pi1up [$UpOrDown piname=$pi1name ip=$pi1ip ]
:local pi2up [$UpOrDown piname=$pi2name ip=$pi2ip ]

I chose to encode the ‘State’ as a 2-digit binary number, where the least-significant digit represents the first Pi, and the most-significant digit represents the second Pi. A 0 indicates up / success, a 1 indicates down / failure. In decimal this is a number between 0 and 3 inclusive. So 0 is both up, 1 is first Pi down, 2 is second Pi down, 3 is both Pis down. This in particular does allow for future expansion, as you can easily run bitwise operations and comparisons on the number.

# Determine the current state.
# The pipe is bitwise OR.  Alternatively we could simply add in this case.
:local CurrentState 0

if (!$pi1up) do={
    :set CurrentState ($CurrentState|1)
}

if (!$pi2up) do={
    :set CurrentState ($CurrentState|2)
}

:put ("Current state: " . $CurrentState)
:put ("Last state   : " . $LastState)

And that’s the difficult bit done! Now just check if the state has changed, and then do whatever it is you need to do as you would on the MikroTik command line – however simple or complex you need it to be.

# If the state has changed since the last time this script was run...
:if ($LastState != $CurrentState) do={
    
    # Here, put your logic for what to do.
    # e.g. Enable a firewall rule, send an email, or even beep the MikroTik for 10 seconds!
    :beep length=10
    
    # Save current state to disk
    $SaveLastState state=$CurrentState
    
} else={
    :put "State has not changed."
}

The full file is here:

########################
###  User variables  ###
########################

:local pi1name "rpi1"
:local pi1ip 192.168.123.101

:local pi2name "rpi2"
:local pi2ip 192.168.123.102

########################
###     Functions    ###
########################

# Function: UpOrDown
#   Returns true or false based on whether the pi is up.
#   It does this by checking a file on the pi's web server for its hostname as the output.  i.e.
#       Successful connection and content matches:        true
#       Successful connection and content does not match: false
#       Failed connection :                               false
#   Parameters
#       piname        The name of the pi
#       ip            The IP of the pi
:local UpOrDown do={
    
    :local host "mywebserver.lab"
    :local srcpath "/netwatch.txt"
    
    # Assume the pi is down until we prove otherwise.
    :local piup false
    
    # The filename we will be temporarily saving the output to.
    # The subdirectory "updown" is created if it doesn't already exist.
    :local filename ("updown/" . $piname . ".txt")
    
    :do {
        
        /tool fetch address=$ip host=$host mode=http src-path=$srcpath dst-path=$filename
        
        # For good measure, to ensure the file is written.
        :delay 2
        
        # Get all of the file's contents.
        :local filecontent [/file get [/file find name=$filename] contents]
        
        # And confirm if it matches the name of the pi
        :if ($filecontent = $piname) do={
            :set piup true
        }

    } on-error={
        # Display that there was an error.
        :put ($piname . " error")
    }
    
    # Display whether it's up or not.
    :if ($piup) do={
        :put ($piname . " is up")
    } else={
        :put ($piname . " is down")
    }
    
    # Delete the file
    /file remove [/file find name=$filename]
    
    # Return true or false
    :return $piup
}

# Function: LoadLastState
#   Returns the state as saved to the file.
:local LoadLastState do={
    :local state
    :do {
        :local StateFilename "updown/state.txt"
        :set state [/file get $StateFilename contents]
    } on-error={
        :set state -1
    }
    
    #:put ("LoadLastState: " . $state)
    :return $state
}

# Function: SaveLastState
#   Saves the state to the file.
#   Parameters
#       state     The state - 0, 1, 2, 3
:local SaveLastState do={
    :local StateFilename "updown/state.txt"
    # This line is required as a workaround to prevent an error if the file does not yet exist.
    /file print file=$StateFilename
    # Save the file
    /file set $StateFilename contents=$state
}

########################
###       Begin      ###
########################

# Load the last state.
:local LastState [$LoadLastState]

:local pi1up [$UpOrDown piname=$pi1name ip=$pi1ip ]
:local pi2up [$UpOrDown piname=$pi2name ip=$pi2ip ]

# We store the state as a 2-digit binary number, where 1 = Pi Down
#   00 (0)      Both up
#   01 (1)      Pi1 Down
#   10 (2)      Pi2 Down
#   11 (3)      Both Down

# Determine the current state.
# The pipe is bitwise OR.  Alternatively we could simply add in this case.
:local CurrentState 0

if (!$pi1up) do={
    :set CurrentState ($CurrentState|1)
}

if (!$pi2up) do={
    :set CurrentState ($CurrentState|2)
}

:put ("Current state: " . $CurrentState)
:put ("Last state   : " . $LastState)

# If the state has changed since the last time this script was run...
:if ($LastState != $CurrentState) do={
    
    # Here, put your logic for what to do.
    # e.g. Enable a firewall rule, send an email, or even beep the MikroTik for 10 seconds!
    :beep length=10
    
    # Save current state to disk
    $SaveLastState state=$CurrentState
    
} else={
    :put "State has not changed."
}

Leave a Comment

Your email address will not be published. Required fields are marked *