Practical PowerShell Scripting for DevOps - Part 2

On this part, I will be extending the monitoring script we wrote on Part-1, with a polling while loop

Hi, I'm back with you with Part 2 of this series. On this part, I will ask you to extend the http ping monitoring script we wrote on Part-1. If you've completed Part-1 challenge then you extend your own script as well. In any case I will be providing my version of Part-1 solution as a starting point. Ok, lets jump on the our problem statement for this challenge.

Problem

Hey fellow DevOps Engineer! We loved the first script you wrote for us, now we can test the health of 10s of our endpoints with ease with just a configuration file. We now have a problem though, we can see that lot's of failures appears to be intermittent, as in if we simply retry couple of more times we will eventually get a successful result. We believe this behaviour is happening either due to a server warming up after a new deployment or some other delay throughout the network connectivity.

To filter out these FALSE positive failures, we would like the new version of this script to have a retry mechanism, so we can truly find out if a service is down. However, last time another engineer setup something similar, which we were running on a CI/CD pipeline, and for a unhealthy endpoint pipeline run would never end, costing our company a lot of money due to CI/CD agent hour costs.

So you need to make sure your solution meets the below requirements:

1- Retry the test for a maximum of N amount of times (configurable) this endpoint if the first attempt does not return a HTTP 200 response

2- Capture the attempt number of successfully response, if endpoint ever returns success code

3- Wait X amount of seconds in between retry attempts

4- Move on to the next endpoint if the maximum retry amount is reached

5- Prints results to terminal in a human readable format

What Success Looks like for this Challenge

The script I wrote on Part-1 were printing a result similar to:

Now we want our script to print a result similar to below:
Now on the first run of my script, I've configured the max retry parameter's value as 2, hence only with 2 retries not all of the flaky sites returned a successful result.

Increasing the max retry count to 10, we can see more endpoints now returning successful responses, on retry attempts. In this run for example FlakyWebsite-#4 endpoint return a success code on 9th attempt.

In both runs FaultyWebsite-#1 does not return a success code (as it's supposed to) and our script polls this endpoint for maximum amount of times.
Our solution works as expected. Great!

Some Pointers For Writing Your Own Solution

This can be your starting point Part-1 Files of my sample solution
Part-1 Files

Some Test Endpoints You Can Use

As this part I'll provide the starting point, you will already have the curated test endpoints json file readily available. It has multiple endpoints for simulating different scenarios. For more information please read part 1

The Main Learning From the Challenge

Polling is a well known programming concept. You check state of a system, evaluate the result and make your next move. We can use loops to implement this retry logic in our script. However, in order to avoid the concept of "infinite loop" we need to make sure we are also implementing some limits and proper break conditions for our loop.

I recommended reading the Powershell documentation for while loop and break from their docs.
Links:

about_while

about_break

I'm using a while loop for these type of scenarios, but other loop types will work too. It will only change your implementation but same result should be achievable.

Tasks You Need To Perform In Your Script

1- Initialize Parameters/Variables for the Inputs you need. You need two new input, one for Maximum Amount of Retries and another one for Time to Wait in between retries.

2- Put testing part of your code in a loop, after your test evaluate the result with a conditional statement (hint: if/else block) and then either exit the while loop (hint: break)

of if a retry is needed, wait for the configured amount of time before making the next test attempt.

3- Initialize a counter variable and keep track of the no of attempt you are currently trying. If your test attempt is successful, you need to pass this attempt no back to the main script. Basically we want to if endpoint returned successful response after how many attempts. Example: 4/5 4 being the attempt number that we got a successful response and 5 being the maximum count of retries we would like to make.

Final Tips

From this point below you will see my sample solution. Depending on your level and learning style, either take a look before or after you gave it your shot.

My Sample Solution

Folder Structure

Files for my Sample Solution

Part-1 Files

Tests.json

This is the list of endpoints I would like to test, stored in JSON format which can be produced and consumed with a variety of modern languages.
What we have here is an array of objects that has name and url keys for us to perform and label the tests.

[{
        "name": "FlakyWebsite-#1",
        "url": "https://httpbin.org/status/200,403,404,500"
    },
    {
        "name": "StableWebsite-#1",
        "url": "https://httpbin.org/status/200"
    },
    {
        "name": "FlakyWebsite-#2",
        "url": "https://httpbin.org/status/200,403,404,500"
    },
    {
        "name": "StableWebsite-#2",
        "url": "https://httpbin.org/status/200"
    },
    {
        "name": "StableWebsite-#3",
        "url": "https://httpbin.org/status/200"
    },
    {
        "name": "StableWebsite-#4",
        "url": "https://httpbin.org/status/200"
    },
    {
        "name": "FlakyWebsite-#3",
        "url": "https://httpbin.org/status/200,403,404,500"
    },
    {
        "name": "FlakyWebsite-#4",
        "url": "https://httpbin.org/status/200,403,404,500"
    },
    {
        "name": "FaultyWebsite-#1",
        "url": "https://httpbin.org/status/500"
    }
]

Our Helper Function, create this file under path ./lib/New-HttpTestResult.ps1

This script will perform an HTTP Ping Test with passed parameters and returns a Test Result Object

Notice the difference from the version in Part-1, this version has 2 new parameters and a while loop, to retry the test if

a non success code returns from the earlier responses. This is the same code you would write to poll anything, you would only change the code

that performs the test. The test can be a kubectl command that grabs a number of pods from a deployment replica set, and you would like to wait

for all containers are in ready state after a manual upscale in number of pods. Or you can perform a SQL connection or TCP port ping to validate

health in another type of non HTTP service. Possibilities are endless, but the main structure of the code will remain the same.

function New-HttpTestResult {
    param (
        [Parameter(ValueFromPipeline = $true)]
        [PSCustomObject]
        $TestArgs,
        # Maximum Retry Amount
        [Parameter()][int]$MaxRetryNo = 10,
        # Time to wait in between retry attempts
        [Parameter()][int]$WaitTimeInSeconds = 1
    )
    $ProgressPreference = 'SilentlyContinue'

    $Method = 'Get'

    $TestCounter = 0 

    # -lt: Lower Than 
    while ($TestCounter -lt $MaxRetryNo) {

        #Increment our counter by 1 before we make our first attempt
        $TestCounter++
        $duration = Measure-Command {
            $Response = Invoke-WebRequest -Uri $TestArgs.url -Method $Method -SkipHttpErrorCheck
        }

        # If we find the 200 code we stop polling
        if($Response.StatusCode.ToString() -eq '200'){
            break;
        }
        else {
            #Else we need to wait for configured amount of time
            Start-Sleep -Seconds $WaitTimeInSeconds
        }
    }

    $result = [PSCustomObject]@{
        name               = $TestArgs.name
        status_code        = $Response.StatusCode.ToString()
        status_description = $Response.StatusDescription
        attempt_no         = "$($TestCounter)/$($MaxRetryNo)"
        responsetime_ms    = $duration.Milliseconds
        timestamp          = (get-date).ToString('O')
    }



    return $result
}

Test-HttpEndpoints.ps1

Our Main script, we will invoke this script to run the tests and produce and returns a test results array

[CmdletBinding()]
param (
    [Parameter(ValueFromPipeline = $true)]
    [string]
    $TestsFilePath =  '.\Tests.json'
)

# Convert JSON Config Files String value to a PowerShell Object
$TestsObj = Get-Content -Path $TestsFilePath | ConvertFrom-Json

# Import the Tester Function
. ./lib/New-HttpTestResult.ps1

# Loop through Test Objects and get the results as a collection
$TestResults = foreach ($Test in $TestsObj) { 
    New-HttpTestResult -TestArgs $Test 
}

$TestResults | Format-Table -AutoSize

Sample Runs

Conclusion

In automation scenarios, you will want to implement defensive scripts, that are not a one shot series of instructions but it has some more bells and whistles to make it more prone to environmental changes in your infrastructure. Having a retry mechanism to wait for a system's state to reach a desired state is a very common need in DevOps automation tasks.

I hope in this part I was able to help you learn something new. If you are giving an attempt to the challenges, good on you. If you are just reading my solutions, do not feel guilty. I have and still reading a lot of other engineer's code on GitHub and their blogs. If you are currently not at a level to give the challenge a try yourself, download my sample, run it and try to understand how it works. Then if you can try to change something in it and see if you can change the behaviour of the script.

I currently don't know what the next part will be about, I'm thinking about running our tests in a CI/CD pipeline or writing another script to create the json file with our endpoints listed automatically, via scanning an Azure subscription. Both of them can come in any order as they are not dependent on each other. I'll see you on the next one.

Blog