1. How does an Uptime Check work?
1.1. How we run a Uptime Check
#1 – START: On the specified schedule the system starts the check process and picks up the check from the queue.
#2 – REGION: The system calculates which region to perform the check from. No checks are executed from the same region after one another.
#3 – EXECUTE: The Uptime Check is executed from the selected region and the response is extracted from the remote host.
4 – VALIDATION: The response is validated up against the headers and assertion(s) configured to the individual checks. Primarily headers are validated then assertions. If one of the validations fails, notifications are transmitted to the recipients.
#5 – ENQUEUE: When the validation is complete the check is enqueued until the next schedule. E.g. 5 minutes.
1.2. Geographic response time
We are using Amazon Web Services (AWS) to execute all our checks. Our ambition is to support all AWS locations, but for now, we only currently support these check locations:
- United Kingdom
Currently, you are not able to configure which locations your checks should be executed from. We are working on it.
1.2.2. Response time
On each check we store different timing phases, which make it easy for you to investigate slow responses.
Here is a short description of each phase:
- DNS Lookup
Time spent performing the DNS lookup.
The time it took to establish a connection.
- TLS Handshake
The time spent negotiating and completing a HTTPS handshake.
- Response time
The time from when the request is initiated and all data is received from the server.
- Total time
The total time from the start until the end.
If we don’t get an answer from your server within 15 seconds the Uptime Check will timeout. The check will be marked as DOWN and in the log, you will see the response as Timeout.
1.3. Create a new check
We have made a step-by-step guide on how to create a new check.
2. State changes
2.1. When is a check verified as DOWN?
To verify the site is in fact down, we always double-check (perform a verify check) before changing the state from UP to DOWN. The second verification check will always be executed from another geographical location. The check will only change state to DOWN if the second check fails.
2.2. When is a check verified as UP?
To verify that the site is back up again, we always triple-check (perform three verify checks) before changing the state from DOWN to UP.
2.3 What is FLAPPING?
Alertdesk supports optional detection of hosts and services that are “flapping.” Flapping occurs when a service or host changes state too frequently, resulting in a storm of problem and recovery notifications. Flapping can be indicative of configuration problems (e.g., thresholds set too low) or network problems.
It does this by storing the results of the last 21 checks of the service in an array. Older check results in the array are overwritten by newer check results.
- Check #1 First calculation. Total flap weight is 0.
- Check #3 Check returned FAIL. Total flap weight is 0.02 * 3 + 0.78 = 0.84
- Check #4 Check returned OK. Total flap weight is 0.02 * 4 + 0.78 = 0.86
- Check #8 Check returned FAIL. Total flap weight is 0.02 * 8 + 0.78 = 0.94
- Check #9 Check returned OK. Total flap weight is 0.02 * 9 + 0.78 = 0.96
- Check #13 Check returned FAIL. Total flap weight is 0.02 * 12 + 0.78 = 1.02
- Check #14 Check returned OK Total flap weight is 0.02 * 13 + 0.78 = 1.04
- Check #18 Check returned FAIL. Total flap weight is 0.02 * 18 + 0.78 = 1.14
- Check #19 Check returned OK. Total flap weight is 0.02 * 19 + 0.78 = 1.16
- Check #21 Check returned FAIL Total flap weight is 0.02 * 21 + 0.78 = 1.2
Sum of weight is 9.16
The score is calculated as (sum of weight/number of checks – 1) * 100
In this example as we are calculating on the last 21 checks:
(9.16 / (21 – 1)) * 100 = 45.8
As the score-value is larger than 25 and lower than 50 the system calculates the check as in flapping state. The check will continue to be in the flapping state as long as the calculated score is between 25 and 50.
When the calculated score exceeds 50, the check is deemed down, meaning the state is changed from flapping to down. When the check is deemed down, it cannot enter the flapping state again even though the score is below 50. The check can only enter the up-state when the down-state has been entered.
2.3.1. Why use FLAPPING?
E.g., Take the illustration below, which is a classic web-setup with multiple webservers running behind a loadbalancer, which distributes the external traffic to the 3 webservers.
The flapping state is a powerful tool to identify malfunctioning systems in a scaled multiserver environment.
As client requests are distributed to the 3 webservers, not all experience that something is malfunctioning. Only the clients who are forwarded to the malfunctioning server C.
As Alertdesk over time will target the malfunctioning server, Alertdesk will observe, that something is wrong. Based on the flapping algorithm, Alertdesk will be able to notify subscribers, and actions on the failing server can be taken.
2.4. Notification delay
It is possible to control the delay between a check changes state and notifications are transmitted to the team. When you create/edit the check, this delay can be configured from instantly to 60 minutes for both down and up notifications.
2.5. False positive
Why does Alertdesk report my website as DOWN, when it’s not?
To avoid false positive notifications, Alertdesk always performs checks from 2 different geographical locations before changing state to DOWN.
Some of the most common reasons for an outage is:
A firewall or server blocks the Alertdesk workers that are executing the checks. In this case, you need to whitelist our workers.
You access the website local/internal – but it’s not possible to access the website externally. Alertdesk can only access your website externally.
Access control/verification of your website. You can setup verifications-headers under Advanced settings for the check.
A short outage (less than a few minutes). It could be a server/application restart, timeout, etc.
3. HTTP methods
You can define what HTTP method your server expects to receive. You need to enable the advanced settings before you can choose the method in the dropdown.
3.1. Description of methods
4. HTTP Request settings
5. Assertions (rules)
5.1. Advanced monitoring using Assertions
Alertdesk is a great tool to monitor the availability of your URL. However, Alertdesk can also do much more advanced analysis on the response using a dynamic ruleset. These rules are named “assertions” and are configured from the “Advanced Settings” tab when you set up your Uptime Check.
There is always be at least one assertion created on your check. As the system creates the Uptime Check a default HTTP 200 assertion rule is created, which validates that a HTTP 200 status code is received from the remote host (your website as a standard HTTP header). This is configured from the assertion ruleset below.
5.2. Monitor your response times
The speed of loading your website may have a significant impact on your customers’ user experience. If your website starts to load slowly, it usually can be corrected if you are aware of the issue.
By using Alertdesk, you can easily monitor your website response times by setup a “HTTP Timings” ruleset. Below you can see how the configuration is made if you wish a notification if the response time exceeds 10 seconds (10.000 milliseconds).
5.3. Validation on content
If you need to monitor certain words or sentences on your website, which could be the phrase “-OK-“, that can be done using the following ruleset.
Validation of content can be used if a server-side script is targeted. Depending on the logic, the server-side script returns a specific value. From that value, you can control if your website should be validated as UP or DOWN (and get notified).
5.3.1. WordPress warnings
A classic problem for those using WordPress as a platform for their website is the Auto-update function. After automatically installation of plugin modules, warnings might suddenly appear on the website.
This error occurs due to components not installed properly. Alertdesk allows you to set up an assertion rule, where you are notified immediately if such warnings appear on your website. This is done by setting up this assertion rule:
5.4. Number value validation
If your page only response a number, you may optionally use the Alertdesk Uptime Check to validate if this value is over or below a certain threshold. E.g., this could be used to monitor your stock and notify if one or more products are running out of stock. You will need to implement the server-side script, which extracts the products that are about to be out-of-stock. The example below validates that the value must be exactly “0”. Otherwise, you’ll be notified.
If you wish to get notified if there are less than 3 products left on stock the configuration is as below:
5.5. Configuration of multiple assertion rules
You can configure as many assertion rules for a single Uptime Check as you like. However, you are notified if just one of the rules fails the validation. From the Uptime Check log, you can see what has failed.
By click on the failed check, you can see which assertion was failed.
5.6 Questions and feedback
If you have any questions at all, please reach out to us, and we will assist. Also, please provide us with any feedback, or if you have ideas about how we can extend our assertions, we would love to hear from you. Please reach out to us on our online chat or send us an e-mail to firstname.lastname@example.org.