Where to start your automation efforts ? An analogy for IT infrastructure folks

Nowadays, many IT shops look a lot like the first few pages of The Phoenix Project : lack of automation and communication which leads to manual and unpredictable deployments, which in turn, leads to lots of firefighting. Fortunately, many of these shops are starting to understand the value of implementing some of the DevOps principles.

Of all the DevOps principles, automation is probably the easiest to grasp so that’s where most IT organisations start. I’m not saying this is the best starting point, but this is the most typical starting point of a DevOps journey. At this point, IT folks who are traditionally infrastructure-focused are asked to move their focus “up-the-stack” and towards automation. This is a difficult shift to make, especially for those who are not willing to adapt and learn new skills. But even for those who are willing to make that shift, it can be confusing.

OK, I’m going to automate stuff, but where do I start ? What should I automate first ?

I’ll attempt to answer that question drawing on my 8 years of hard-learned lessons in technical support and using a metaphor that most infrastructure-focused IT folks can relate to.

Storage performance troubleshooting :

Users are complaining that a specific application is “very slow”, so you go and check the resources usage for the virtual machine running that particular application. You notice frequent storage latency spikes of up to 3 seconds ! Now, you need to identify where this latency is coming from. It is easier said than done, storage infrastructure is complex, especially SAN-based storage. The latency can come from many different areas : from within the virtual machine (or the application itself), from the hypervisor, from the SAN fabric (cables and switches) or from the storage array.

But ultimately, whatever the transport protocol, it boils down to encapsulated SCSI commands. So you need to consider the storage system as a whole and try to picture it as a pipeline transporting SCSI commands from a VM to a disk. Any component which is struggling to keep up will eventually slow down the entire pipeline. Why ?

Because the SCSI commands, or frames, or packets, or whatever the unit of work is at this stage, are going to be stored in the queue (or buffer) of the struggling component. And what happens when the struggling component’s queue is full ? The queue of another component, located upstream in the pipeline, starts to fill up. In the meantime, any component located downstream in the pipeline is left with no work to do, waiting, waiting, waiting…

So you need to check the latency metrics at each components in the pipeline and single out which one introduces the most latency : the big bad bottleneck. It’s not always obvious and sometimes the victim can be confused with the culprit, but that’s where your expertise comes in. That’s why you are paid the big bucks, right ?

Let’s say you have identified the latency introduced in each areas, like so :

  • Virtual machine : 10 milliseconds
  • Hypervisor : 25 milliseconds
  • SAN fabric : 50 milliseconds
  • Storage array : variable, up to almost 3 seconds

Now, if you reduce the latency at the storage array level by 2 seconds, how much latency improvement are you going to see on the whole storage system end-to-end ?
2 seconds.
If you reduce the latency the SAN fabric level by 30 milliseconds, how much latency improvement are you going to see on the whole storage system end-to-end ?
This seems quite obvious but this is to emphasize a very important point : the energy, time and resources spent on anything else than the bottleneck are 100% waste. So where do you start ? You identify the bottleneck and focus all your efforts on improving the performance of that component.

After further investigation, it turns out there was 2 dead disks in the RAID set where the virtual machine was stored. You replace the 2 disks and now the latency at the storage array level is only 30 milliseconds. That’s great, you have eliminated the first bottleneck, but now you have new one : the SAN fabric. Yep, in any pipeline, at any given time, there is one (and only one) bottleneck. So now, you can focus all your efforts on the fabric, unless the current performance is acceptable for your application(s).

This is in a nutshell how to prioritize performance improvements in a storage system, but what the hell does it have to do with automation ?

IT service delivery as a pipeline :

The “Lean” thinking (and all the stuff that came from it, like Agile) originated from applying the principles of Lean manufacturing (pioneered by Toyota) to the software industry. It compares a software delivery pipeline with a manufacturing assembly line. This is a very powerful analogy and many fascinating learnings have been drawn from it. If you are not convinced, I suggest you read this : Lean Software Development: An Agile Toolkit.

With the emergence of DevOps and “Lean IT“, some people started to realize that it doesn’t necessarily have to be software, these principles can be applied to IT operations as well. IT service delivery can be considered as a pipeline, pretty much like a storage system. It transports a continuous flow of tickets, feature requests, change requests, or work items instead of SCSI commands, frames or packets. The destination of the services it delivers is one or more end-users, instead of a disk.

The components involved in the different stages along the pipeline vary a lot depending on the type and scope of the request : application team, systems team, maybe network and storage team, security team, possibly some kind of QA, a manager and/or procurement approval depending on the cost, etc… It can be potentially be very complex, but you need to gather an exhaustive list of all the parts involved in the pipeline, even outside of IT. This requires to take a step back from your IT bubble and look at the bigger picture, which is not easy, but it is important to have a holistic view of “the system”.

This mental model of a pipeline allows to “troubleshoot” the performance of your IT service delivery. Just like any pipeline, the 2 main ways to measure performance are :

  • Throughput (the amount of work that can go through the pipeline in a given amount of time)
  • Latency (the time it takes for a single unit of work to go from the beginning of the pipeline to the end)

Similarly to most storage admins, Agile and DevOps tend to focus more on latency as a performance indicator. Agile talks a lot about “velocity” and DevOps uses the term “lead times“. So again, you need to determine the latency at each stage of the IT service delivery pipeline and identify the component which adds the most latency : the big bad bottleneck.

Like in a storage infrastructure, bottlenecks tend to manifest themselves as a queue filling up with work in progress. But this work waiting to be processed is not often obvious in the context of IT services, because you cannot touch it (except for hardware). You can easily see car parts piling up in a warehouse, but can you see the 1s and 0s piling up ? A nice way to make work more visible is to use tools like Kanban boards.

When the bottleneck is identified, you know where to focus your efforts. Not just your automation efforts, ALL efforts should be focused on addressing the bottleneck. Yes, automation may very well be only a part of the solution and in some cases, it might even be a very small portion of the solution. This is because technology may only be a small part of the problem. The bigger problem may very well be lack of communication, or lack of skills, or information retention, or overly complex processes, etc… This means your automation efforts need to be integrated into a broader effort to eliminate the bottleneck. It is about IT and “the business” working together towards a common goal.

When the latency of the struggling component in the IT service delivery pipeline has been reduced enough to not be the bottleneck anymore (using automation and any other means), you know that you have improved the performance of your entire IT organization, and by how much. By the way, this information can come in handy for your next pay review. After that, you can move on and focus all your energy (and newly acquired automation skills) on the next bottleneck.

So there is no ready-made recipe to tell you where to start your automation efforts. It is not easy because every business is different and because it requires a shift in mindset, thinking about the bigger picture. But hopefully, this article provides some guidance on how to prioritize your efforts. It is very much about optimizing the impact of your work by focusing on relieving the most crippling pain your IT organization has, and by doing so, making yourself more valuable.

If you want more on these topics, here is a pretty deep perspective on automation, infrastructure (networking in this case) and bottlenecks.

Searching code, repos, issues, pull requests and users on GitHub using PowerShell

Coding, especially in PowerShell, is not about remembering the exact syntax of how to do something, it is more about knowing how to try things out and how to find the information you need to do whatever you need to accomplish.

Quite often, even though we don’t remember the exact syntax, we know the thing we want to do is something we’ve already done before. So there is a good chance it’s already in the code base we have published on GitHub over the years.
If only we could use GitHub as an extension of our brain and, even better, directly from our PowerShell console …

This is why I wrote the PowerShell module PSGithubSearch, available on the PowerShell Gallery and (off course) on GitHub. It uses the extensive GitHub Search API to search the following items on GitHub.com :

  • Code
  • Repositories
  • Issues
  • Pull requests
  • Users
  • Organisations

Let’s have a look at a few examples to illustrate how we can use its 4 cmdlets.

Find-GitHubCode :

Maybe you are working on a new PowerShell function which could really use a -Confirm parameter, let’s say Invoke-NuclearRedButton.
You know that this requires the SupportsShouldProcess cmdletBinding attribute but you never ever remember exactly how to use it inside the function. Then, you could do that :

C:\> Find-GitHubCode -Keywords 'SupportsShouldProcess' -User $Me -Language 'PowerShell' | Select-Object -First 1

FileName               : Update-ChocolateyPackage.psm1
FilePath               : Update-ChocolateyPackage/Update-ChocolateyPackage.psm1
File Url               : https://github.com/MathieuBuisson/Powershell-Utility/blob/5dde8e9bb4fa6244953fee4042e2100acfc6
File Details Url       : https://api.github.com/repositories/28262426/contents/Update-ChocolateyPackage/Update-Chocolat
Repository Name        : MathieuBuisson/Powershell-Utility
Repository Description : Various generic tools (scripts or modules) which can be reused from other scripts or modules
Repository Url         : https://github.com/MathieuBuisson/Powershell-Utility


OK, but you might want to see the actual file content with the code. This is not built into the cmdlet because it requires an additional API call and the GitHub search API limits the number of unauthenticated requests to 10 per minute, so I tried to limit as much as possible the number of API requests. Anyway, here is how to get the actual code :

C:\> $FileUrl = (Find-GitHubCode -Keywords 'SupportsShouldProcess' -User $Me -Language 'PowerShell' | Select-Object -First 1).url
C:\> $Base64FileContent = (Invoke-RestMethod -Uri $FileUrl).Content
C:\> [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($Base64FileContent)) -split '\n' |
 Where-Object { $_ -match 'ShouldProcess' }

                If ($PSCmdlet.ShouldProcess($($CurrentPackage.Name), "Install-Package")) {

Yep, not that easy.
The original result provides us with a URL that we can query to get more information on that file.
This additional query gives us the content of the file but as a Base64 encoded string, that’s why we need to decode it as UTF8.
But then, it is still a single (large) string, that’s why we needed to split it on each new line character (\n).

This is not very practical for large files, so you can restrict the search to files of a certain size (in bytes), like so :

C:\> Find-GitHubCode -Repo "$Me/Powershell-Utility" -Keywords 'SupportsShouldProcess' -SizeBytes '<4000'

FileName               : Update-ChocolateyPackage.psm1
FilePath               : Update-ChocolateyPackage/Update-ChocolateyPackage.psm1
File Url               : https://github.com/MathieuBuisson/Powershell-Utility/blob/5dde8e9bb4fa6244953fee4042e2100acfc6
File Details Url       : https://api.github.com/repositories/28262426/contents/Update-ChocolateyPackage/Update-Chocolat
Repository Name        : MathieuBuisson/Powershell-Utility
Repository Description : Various generic tools (scripts or modules) which can be reused from other scripts or modules
Repository Url         : https://github.com/MathieuBuisson/Powershell-Utility


There are other parameters which can help narrow down the search to get exactly what we need. We’ll leave that as homework, because we have 3 other cmdlets to cover.

Find-GitHubRepository :

As an example, we might want to know what interesting project one of our favorite automation hero is currently working on :

C:\> $WarrenRepos = Find-GitHubRepository -Keywords 'PowerShell' -User 'RamblingCookieMonster'
C:\> $WarrenRepos | Sort-Object -Property pushed_at -Descending | Select-Object -First 1

Name         : PSDepend
Full Name    : RamblingCookieMonster/PSDepend
Owner        : RamblingCookieMonster
Url          : https://github.com/RamblingCookieMonster/PSDepend
Description  : PowerShell Dependency Handler
Fork         : False
Last Updated : 2016-08-28T15:48:52Z
Last Pushed  : 2016-09-29T16:06:04Z
Clone Url    : https://github.com/RamblingCookieMonster/PSDepend.git
Size (KB)    : 202
Stars        : 10
Language     : PowerShell
Forks        : 2

Warren has been working on this new project for a good month, and this project already has 10 stars and 2 forks. It definitely looks like an interesting project.

We can also sort the results by popularity (number of stars) or by activity (number of forks). For example, we could get the 5 most popular PowerShell projects related to deployment, like so :

C:\> $DeploymentProjects = Find-GitHubRepository -Keywords 'Deployment' -In description -Language 'PowerShell' -SortBy stars
C:\> $DeploymentProjects | Select-Object -First 5

Name         : AzureStack-QuickStart-Templates
Full Name    : Azure/AzureStack-QuickStart-Templates
Owner        : Azure
Url          : https://github.com/Azure/AzureStack-QuickStart-Templates
Description  : Quick start ARM templates that deploy on Microsoft Azure Stack
Fork         : False
Last Updated : 2016-10-01T19:16:20Z
Last Pushed  : 2016-09-28T22:03:44Z
Clone Url    : https://github.com/Azure/AzureStack-QuickStart-Templates.git
Size (KB)    : 11617
Stars        : 118
Language     : PowerShell
Forks        : 74

Name         : unfold
Full Name    : thomasvm/unfold
Owner        : thomasvm
Url          : https://github.com/thomasvm/unfold
Description  : Powershell-based deployment solution for .net web applications
Fork         : False
Last Updated : 2016-09-25T03:55:03Z
Last Pushed  : 2014-10-10T07:28:22Z
Clone Url    : https://github.com/thomasvm/unfold.git
Size (KB)    : 1023
Stars        : 107
Language     : PowerShell
Forks        : 13

Name         : AppRolla
Full Name    : appveyor/AppRolla
Owner        : appveyor
Url          : https://github.com/appveyor/AppRolla
Description  : PowerShell framework for deploying distributed .NET applications to multi-server environments inspired
               by Capistrano
Fork         : False
Last Updated : 2016-09-20T10:29:29Z
Last Pushed  : 2013-08-25T18:04:13Z
Clone Url    : https://github.com/appveyor/AppRolla.git
Size (KB)    : 546
Stars        : 91
Language     : PowerShell
Forks        : 10

Name         : SharePointDsc
Full Name    : PowerShell/SharePointDsc
Owner        : PowerShell
Url          : https://github.com/PowerShell/SharePointDsc
Description  : The SharePointDsc PowerShell module provides DSC resources that can be used to deploy and manage a
               SharePoint farm
Fork         : False
Last Updated : 2016-09-05T12:05:18Z
Last Pushed  : 2016-09-27T17:07:10Z
Clone Url    : https://github.com/PowerShell/SharePointDsc.git
Size (KB)    : 3576
Stars        : 80
Language     : PowerShell
Forks        : 48

Name         : PSDeploy
Full Name    : RamblingCookieMonster/PSDeploy
Owner        : RamblingCookieMonster
Url          : https://github.com/RamblingCookieMonster/PSDeploy
Description  : Simple PowerShell based deployments
Fork         : False
Last Updated : 2016-09-27T17:33:51Z
Last Pushed  : 2016-09-29T09:42:46Z
Clone Url    : https://github.com/RamblingCookieMonster/PSDeploy.git
Size (KB)    : 665
Stars        : 76
Language     : PowerShell
Forks        : 18

We used the In parameter to look for the “Deployment” keyword in the description field of repositories, we could look only in the repositories name or ReadMe, if we wanted to.

Now let’s dive into the heart of GitHub collaboration with issues and pull requests.

Find-GitHubIssue :

Maybe, we want to know the most commented open issue on the PowerShell project which hasn’t been assigned to anyone yet. This is pretty easy to do :

C:\> Find-GitHubIssue -Repo 'PowerShell/PowerShell' -Type issue -State open -No assignee -SortBy comments |
 Select-Object -First 1

Title        : Parameter binding problem with ValueFromRemainingArguments in PS functions
Number       : 2035
Id           : 172737066
Url          : https://github.com/PowerShell/PowerShell/issues/2035
Opened by    : dlwyatt
Labels       : {Area-Language, Issue-Bug, Issue-Discussion}
State        : open
Assignees    :
Comments     :
Created      : 2016-08-23T15:51:55Z
Last Updated : 2016-09-29T14:45:43Z
Closed       :
Body         : Steps to reproduce

               Define a PowerShell function with an array parameter using the ValueFromRemainingArguments property of
               the Parameter attribute.  Instead of sending multiple arguments, send that parameter a single array

                & {

                        [Parameter(Position=1, ValueFromRemainingArguments)]
                    for ($i = 0; $i -lt $Extra.Count; $i++)
                       "${i}: $($Extra[$i])"
                } root aa,bb

               Expected behavior
               The array should be bound to the parameter just as you sent it, the same way it works for cmdlets.
               (The "ValueFromRemainingArguments" behavior isn't used, in this case, it should just bind like any
               other array parameter type.)  The output of the above script block should be:

               0: aa
               1: bb

               Actual behavior
               PowerShell appears to be performing type conversion on the argument to treat the array as a single
               element of the parameter's array, instead of checking first to see if more arguments will be bound as
               "remaining arguments" first.  The output of the above script block is currently:

               0: aa bb

               Additional information

               To demonstrate that the behavior of cmdlets is different, you can use this code:

               Add-Type -OutputAssembly $env:temp\testBinding.dll -TypeDefinition @'
                   using System;
                   using System.Management.Automation;

                   [Cmdlet("Test", "Binding")]
                   public class TestBindingCommand : PSCmdlet
                       [Parameter(Position = 0)]
                       public string Root { get; set; }

                       [Parameter(Position = 1, ValueFromRemainingArguments = true)]
                       public string[] Extra { get; set; }

                       protected override void ProcessRecord()
                           for (int i = 0; i < Extra.Length; i++)
                               WriteObject(String.Format("{0}: {1}", i, Extra[i]));

               Import-Module $env:temp\testBinding.dll

               Test-Binding root aa,bb

               Environment data

               <!-- provide the output of $PSVersionTable -->

               > $PSVersionTable
               Name                           Value
               ----                           -----
               PSEdition                      Core
               PSVersion                      6.0.0-alpha
               PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
               WSManStackVersion              3.0
               GitCommitId                    v6.0.0-alpha.9-107-g203ace04c09dbbc1ac00d6b497849cb69cc919fb-dirty
               PSRemotingProtocolVersion      2.3

That is a pretty detailed and complex issue.

Now, we might wonder who is the top contributor (in terms of Pull requests) to the PowerShell documentation project :

C:\> $DocsPR = Find-GitHubIssue -Type pr -Repo 'PowerShell/PowerShell-Docs'
C:\> $DocsPR | Group-Object { $_.user.login } |
 Sort-Object -Property Count -Descending | Select-Object -First 10

Count Name                      Group
----- ----                      -----
  141 eslesar                   {@{url=https://api.github.com/repos/PowerShell/PowerShell-Docs/issues/650; repositor...
   81 HemantMahawar             {@{url=https://api.github.com/repos/PowerShell/PowerShell-Docs/issues/656; repositor...
   59 alexandair                {@{url=https://api.github.com/repos/PowerShell/PowerShell-Docs/issues/627; repositor...
   32 juanpablojofre            {@{url=https://api.github.com/repos/PowerShell/PowerShell-Docs/issues/657; repositor...
   28 neemas                    {@{url=https://api.github.com/repos/PowerShell/PowerShell-Docs/issues/107; repositor...
   26 joeyaiello                {@{url=https://api.github.com/repos/PowerShell/PowerShell-Docs/issues/554; repositor...
   20 Dan1el42                  {@{url=https://api.github.com/repos/PowerShell/PowerShell-Docs/issues/645; repositor...
   10 PlagueHO                  {@{url=https://api.github.com/repos/PowerShell/PowerShell-Docs/issues/330; repositor...
    9 JKeithB                   {@{url=https://api.github.com/repos/PowerShell/PowerShell-Docs/issues/605; repositor...
    8 saldana                   {@{url=https://api.github.com/repos/PowerShell/PowerShell-Docs/issues/455; repositor...

Wow, some people are really into documentation.

Find-GithubUser :

GitHub.com is the largest open-source software community, so it is a great place to find passionate and talented coders who are willing to share their craft with the community.

Let’s say you are a recruiter or a hiring manager and you are looking for a PowerShell talent in Ireland. The cmdlet Find-GithubUser can facilitate that search :

C:\> Find-GithubUser -Type user -Language 'PowerShell' -Location 'Ireland' | Where-Object { $_.Hireable }

UserName      : JunSmith
Full Name     : Jun Smith
Type          : User
Url           : https://github.com/JunSmith
Company       :
Blog          :
Location      : Ireland
Email Address :
Hireable      : True
Bio           :
Repos         : 12
Gists         : 0
Followers     : 7
Following     : 4
Joined        : 2015-01-17T13:27:24Z

UserName      : TheMasterPrawn
Full Name     : Rob Allen
Type          : User
Url           : https://github.com/TheMasterPrawn
Company       : Unity Technology Solutions IRL
Blog          :
Location      : Ireland
Email Address :
Hireable      : True
Bio           : I.T/Dev guy, gamer, geek living and working in Ireland.
Repos         : 3
Gists         : 0
Followers     : 0
Following     : 0
Joined        : 2014-06-10T11:09:20Z

2 users only. That’s helpful … It highlights the huge PowerShell skill gap we have in Ireland.

Let’s focus on UK and organizations, then. Organizations don’t have followers (or following) so we cannot filter them on the number of followers they have, but we can narrow down the search to those which have 5 or more repos :

C:\> Find-GithubUser -Type org -Language 'PowerShell' -Location 'UK' -Repos '>=5'

UserName      : SpottedHyenaUK
Full Name     : Spotted Hyena UK
Type          : Organization
Url           : https://github.com/SpottedHyenaUK
Company       :
Blog          : http://www.spottedhyena.co.uk
Location      : Leeds, UK
Email Address :
Hireable      :
Bio           :
Repos         : 5
Gists         : 0
Followers     : 0
Following     : 0
Joined        : 2015-02-03T14:38:16Z

UserName      : VirtualEngine
Full Name     : Virtual Engine
Type          : Organization
Url           : https://github.com/VirtualEngine
Company       :
Blog          : http://virtualengine.co.uk
Location      : UK
Email Address : info@virtualengine.co.uk
Hireable      :
Bio           :
Repos         : 17
Gists         : 0
Followers     : 0
Following     : 0
Joined        : 2014-03-21T14:51:14Z

That was just a few examples, but since each of these cmdlets have many parameters, this was just scratching the surface of what can be done with them.
For the full list of parameters (and how to use them), please refer to the README file on GitHub or use Get-Help.

Also, I’m aware it’s not perfect (yet) so issues and pull requests are very much welcome.