How to create a custom rule for PSScriptAnalyzer

As you probably already know, PSScriptAnalyzer is a static code analysis tool, which checks PowerShell code against rules representing best practices and style guidelines. This is a fantastic tool to set coding style, consistency and quality standards, and if we want to, we can easily enforce these standards within a build pipeline.

The PowerShell community was very much involved in the definition of PSScriptAnalyzer rules, so these rules really make a lot of sense as general guidelines and they are widely accepted by the PowerShell community. However, a given company or project might have specific coding standards which may contain different or more specific rules. Or maybe, you feel like Silicon Valley’s Richard regarding Tabs vs Spaces.

Fortunately, PSScriptAnalyzer allows us to create and use custom rules. In this article, we are going to learn how to do that with a simple example. Let’s say we have coding standards which specifies that all variables names should follow a consistent capitalization style, in particular : PascalCasing. So we are going to write a PSScriptAnalyzer rule to check our code against that convention in the form of a function.

To write this function, our starting point should be this documentation page.
First, how are we going to name our function ? If we look at the CommunityAnalyzerRules module, we see that all the functions names use the verb “Measure“. Why ? I don’t know, but it seems like a sensible convention to follow. That way, if we have multiple rules stored in a single module, we can export all of of them by simply adding the following in the module :

Export-ModuleMember -Function Measure-*

 
So, given our rule is about PascalCasing, the function name “Measure-PascalCase” makes sense.

Next, we need a proper comment-based help for our function. This looks like this :

Function Measure-PascalCase {
<#
.SYNOPSIS
    The variables names should be in PascalCase.

.DESCRIPTION
    Variable names should use a consistent capitalization style, i.e. : PascalCase.
    In PascalCase, only the first letter is capitalized. Or, if the variable name is made of multiple concatenated words, only the first letter of each concatenated word is capitalized.
    To fix a violation of this rule, please consider using PascalCase for variable names.

.EXAMPLE
    Measure-PascalCase -ScriptBlockAst $ScriptBlockAst

.INPUTS
    [System.Management.Automation.Language.ScriptBlockAst]

.OUTPUTS
    [Microsoft.Windows.PowerShell.ScriptAnalyzer.Generic.DiagnosticRecord[]]

.NOTES
    https://msdn.microsoft.com/en-us/library/dd878270(v=vs.85).aspx
    https://msdn.microsoft.com/en-us/library/ms229043(v=vs.110).aspx
#>

 
The DESCRIPTION part of the help is actually used by PSScriptAnalyzer so it is important. It should contain an explanation of the rule, as well as a brief explanation of how to remediate any violation of the rule. Here, we don’t want to assume that all users know what PascalCase means, so we give a succinct but (hopefully) clear definition of PascalCase.

In the INPUTS field, we tell the user that the only parameter for our function takes an object of the type : [System.Management.Automation.Language.ScriptBlockAst], but it could be other types of AST objects. But wait, What is AST ?

The short(ish) version is that PowerShell 3.0 introduced a new parser and that Parser relies on AST to expose various elements of the PowerShell language as objects. This facilitates parsing PowerShell code and extract objects corresponding to language elements like : variables, function definitions, parameter blocks, parameters, arrays, hashtables, Foreach statements, If statements, the list goes on and on … And PSScriptAnalyzer relies heavily on this AST-based parser.

In the OUTPUTS field, we explicitly tell the user that the function will return one or more objects of the type [Microsoft.Windows.PowerShell.ScriptAnalyzer.Generic.DiagnosticRecord[]]. But the actual user will be PSScriptAnalyzer, so this is really a contract between our function and PSScriptAnalyzer. This is more formally declared with the following function attribute :

[OutputType([Microsoft.Windows.PowerShell.ScriptAnalyzer.Generic.DiagnosticRecord[]])]

 
But even with this declaration, PowerShell doesn’t enforce that. So it’s our responsibility to ensure our code doesn’t return anything else, otherwise, PSScriptAnalyzer will not be happy.

Now it is time to tackle the code inside our function. Looking at the CommunityAnalyzerRules module most functions have the same basic structure :

#region Define predicates to find ASTs.

[ScriptBlock]$Predicate = {
    Param ([System.Management.Automation.Language.Ast]$Ast)

    [bool]$ReturnValue = $False
    If ( ... ) {

        ...

    }
    return $ReturnValue
}
#endregion

#region Find ASTs that match the predicates.
[System.Management.Automation.Language.Ast[]]$Violations = $ScriptBlockAst.FindAll($Predicate, $True)

If ($Violations.Count -ne 0) {

    Foreach ($Violation in $Violations) {

        $Result = New-Object `
                -Typename "Microsoft.Windows.PowerShell.ScriptAnalyzer.Generic.DiagnosticRecord" `
                -ArgumentList  ...
          
        $Results += $Result
    }
}
return $Results
#endregion

 
We don’t have to follow that structure but it is a very helpful scaffolding. As we can see above, the function is divided in 2 logical parts: the first one is where we define one or more predicates corresponding to our rule and the second one is where we actually use the predicate(s) against input (PowerShell code) to identify any violation(s) of our rule.

Defining predicates

First, what is a predicate ?
It is a scriptblock which returns $True or $False and it is used to filter objects. We have a bunch of objects that we feed to our predicate then, we keep the objects for which the predicate returned $True and we filter out the objects for which the predicate returned $False. Sounds complicated ? It’s not, and you are using predicates. All. The. Time :

C:\> $ThisIsAPredicate = { $_.Name -like "*.ps*1" }
C:\> Get-ChildItem -Recurse | Where-Object $ThisIsAPredicate

 
In the context of our PSScriptAnalyzer rule function, the predicate is used to identify violations of our rule. Any piece of PowerShell code which returns $True when fed to our predicate has a violation of our rule. We can use multiple methods to detect violations, so we can define multiple predicates if we need/want to. Here, this is a simple example so we are going to define a single predicate.

Our predicate should take input (pieces of PowerShell code) via a parameter. Here, the parameter is named Ast and it takes objects of the type [System.Management.Automation.Language.Ast]. This is the generic class for AST, this allows the predicate’s parameter to accept objects of child classes like [System.Management.Automation.Language.ScriptBlockAst], [System.Management.Automation.Language.StatementAst], etc…

            [ScriptBlock]$Predicate = {
                Param ([System.Management.Automation.Language.Ast]$Ast)

                ...

 
Our rule for PascalCasing relates only to variable names, so we first need to identify variables. What is most relevant for naming is when variables are defined, or assigned a value, not really when they are referenced. So the arguably best way to identify variables for our particular purpose is to identify variable assignments, like so :

If ($Ast -is [System.Management.Automation.Language.AssignmentStatementAst]) {

    ...

 
Next, we need to identify any variable names which don’t follow PascalCasing. For that, we’ll use the comparison operator -cnotmatch and a regex. As you probably know, PowerShell is not case sensitive. But our rule is all about casing, it is case hypersensitive. This makes the “c” in -cnotmatch crucial for our predicate to work :

[System.Management.Automation.Language.AssignmentStatementAst]$VariableAst = $Ast
    If ($VariableAst.Left.VariablePath.UserPath -cnotmatch '^([A-Z][a-z]+)+$') {
        $ReturnValue = $True
    }

 
To extract only the variable names from our variable assignment objects, we take their “Left” property (what’s on the left side of the assignment operator), then the “VariablePath” property and then the “UserPath” nested property. This gives us only the variable name as a [string]. If that string doesn’t match our regular expression, the predicate returns $True, which means there is a violation.

A brief explanation of the regex used above ([A-Z][a-z]+) :
this means one upper case letter followed by one or more lower case letter(s). This particular pattern can be repeated so we put it between parenthesis and append a “+”. And all this should strictly between the beginning of the string “^” and the end of the string “$”.

Off course, this detection method is limited because there is no intelligence to detect words of the English language (or any language) which may be concatenated to form the variable name :

PS C:\> "FirstwordSecondword" -cmatch '^([A-Z][a-z]+)+$'
True

PS C:\> "FirstwoRdsecoNdword" -cmatch '^([A-Z][a-z]+)+$'
True

 
Also, I’m not a big fan of using digits in variable names but if you want the rule to allow that, you can use the following regex :

PS C:\> "Word1Word2" -cmatch '^([A-Z]\w+)+$'
True

 

Using the predicate to detect violations

Now, we can use our predicate against whatever PowerShell code is fed to our Measure-PascalCase function via its $ScriptBlockAst parameter. The input PowerShell code is an object of the type [System.Management.Automation.Language.ScriptBlockAst], so like most AST objects, it has a FindAll() method which we can use to find all the elements within that object which match a predicate.

[System.Management.Automation.Language.Ast[]]$Violations = $ScriptBlockAst.FindAll($Predicate, $True)

 
The second parameter of the FindAll() method ($True) tells it to search recursively in nested elements.

Now, for any violation of our rule, we need to create an object of the type [Microsoft.Windows.PowerShell.ScriptAnalyzer.Generic.DiagnosticRecord], because PSScriptAnalyzer expects our function to return an array of object(s) of that specific type :

Foreach ($Violation in $Violations) {

    $Result = New-Object `
            -Typename "Microsoft.Windows.PowerShell.ScriptAnalyzer.Generic.DiagnosticRecord" `
            -ArgumentList "$((Get-Help $MyInvocation.MyCommand.Name).Description.Text)",$Violation.Extent,$PSCmdlet.MyInvocation.InvocationName,Information,$Null
          
    $Results += $Result
}

 
Pay particular attention to the 5 values passed to the -ArgumentList parameter of the cmdlet New-Object. To see what each of these values correspond to, we can have a look at the constructor(s) for this class :

C:\> [Microsoft.Windows.PowerShell.ScriptAnalyzer.Generic.DiagnosticRecord]::new

OverloadDefinitions
-------------------
Microsoft.Windows.PowerShell.ScriptAnalyzer.Generic.DiagnosticRecord new()
Microsoft.Windows.PowerShell.ScriptAnalyzer.Generic.DiagnosticRecord new(string message,
System.Management.Automation.Language.IScriptExtent extent, string ruleName,
Microsoft.Windows.PowerShell.ScriptAnalyzer.Generic.DiagnosticSeverity severity, string scriptName, string ruleId)

 
For the “Message” property of our [DiagnosticRecord] objects, hard-coding a relatively long message would not look nice, so here, we are reusing our carefully crafted description from the comment-based help. We don’t have to do this, but that way, we don’t reinvent the wheel.

Then, each resulting object is added to an array : $Results.
Finally, when we are done processing violations, we return that array for PSScriptAnalyzer‘s consumption :

            }
            return $Results
            #endregion
        }

 
That’s it. The module containing the full function is on this GitHub page.

Now, let’s use our custom rule with PSScriptAnalyzer against an example script :

C:\> Invoke-ScriptAnalyzer -Path .\ExampleScript.ps1 -CustomRulePath .\MBAnalyzerRules.psm1 |
 Select-Object RuleName, Line, Message | Format-Table -AutoSize -Wrap

RuleName                           Line Message
--------                           ---- -------
MBAnalyzerRules\Measure-PascalCase   15 Variable names should use a consistent capitalization style, i.e. : PascalCase.
                                        In PascalCase, only the first letter is capitalized. Or, if the variable name
                                        is made of multiple concatenated words, only the first letter of each
                                        concatenated word is capitalized.
                                        To fix a violation of this rule, please consider using PascalCase for variable
                                        names.
MBAnalyzerRules\Measure-PascalCase   28 Variable names should use a consistent capitalization style, i.e. : PascalCase.
                                        In PascalCase, only the first letter is capitalized. Or, if the variable name
                                        is made of multiple concatenated words, only the first letter of each
                                        concatenated word is capitalized.
                                        To fix a violation of this rule, please consider using PascalCase for variable
                                        names.
MBAnalyzerRules\Measure-PascalCase   86 Variable names should use a consistent capitalization style, i.e. : PascalCase.
                                        In PascalCase, only the first letter is capitalized. Or, if the variable name
                                        is made of multiple concatenated words, only the first letter of each
                                        concatenated word is capitalized.
                                        To fix a violation of this rule, please consider using PascalCase for variable
                                        names.
MBAnalyzerRules\Measure-PascalCase   88 Variable names should use a consistent capitalization style, i.e. : PascalCase.
                                        In PascalCase, only the first letter is capitalized. Or, if the variable name
                                        is made of multiple concatenated words, only the first letter of each
                                        concatenated word is capitalized.
                                        To fix a violation of this rule, please consider using PascalCase for variable
                                        names.

 
That’s cool, but we probably want to see the actual variable names which are not following our desired capitalization style. We can obtain this information like so :

VariableNames
 
We can see that in the case of this script (pun intended), the case of variable names is all over the place, and we can easily go and fix it.

Adding ConfigurationData dynamically from a DSC configuration

When writing a DSC configuration, separating the environmental data from the DSC configuration is a best practice : it allows to reuse the same configuration logic for different environments, for example the Dev, QA and Production environments . This generally means that the environment data is stored in separate .psd1 files. This is explained in this documentation page.

However, these configuration data files are relatively static, so if the environment changes frequently these files might end up containing outdated information. A solution is to keep the static environment data in the configuration data files and then adding the more dynamic data on the fly from the DSC configuration itself.

A good example of this use case is a web application, where the configuration is identical for all web servers but these servers are treated not as pets but as cattle : we create and kill them on a daily basis. Because they are cattle, we don’t call them by their name, in fact we don’t even know their name. So the configuration data file doesn’t contain any node names :

@{
    # Node specific data
    AllNodes = @(

       # All the Web Servers have following information 
       @{
            NodeName           = '*'
            WebsiteName        = 'ClickFire'
            SourcePath         = '\\DevBox\SiteContents\'
            DestinationPath    = 'C:\inetpub\wwwroot\ClickFire_Content'
            DefaultWebSitePath = 'C:\inetpub\wwwroot\ClickFire_Content'
       }
    );
    NonNodeData = ''
}

 
By the way, the web application used for illustration purposes is an internal HR app, codenamed “Project ClickFire”.

Let’s assume the above configuration data is all the information we need to configure our nodes. That’s great, but we still need some node names, otherwise there will be no MOF file generated when we run the configuration. So we’ll need the query some kind of database to get the names of the web servers for this application, Active Directory for example. This is easy to do, especially if these servers are all in the same OU and/or there is a naming convention for them :

C:\> $DynamicNodeNames = Get-ADComputer -SearchBase "OU=Project ClickFire,OU=Servers,DC=Mat,DC=lab" -Filter {Name -Like "Web*"} |
Select-Object -ExpandProperty Name

C:\> $DynamicNodeNames

Web083
Web084
Web086
  

 
Now that we have the node names, we need to add a hashtable for each node into the “AllNodes” section of our configuration data. To do that, we first need to import the data from the configuration data file and we store it into a variable for further manipulation. There is a new cmdlet introduced in PowerShell 5.0 which makes this very simple : Import-PowerShellDataFile :

C:\> $EnvironmentData = Import-PowerShellDataFile -Path "C:\Lab\EnvironmentData\Project_ClickFire.psd1"
C:\> $EnvironmentData

Name                           Value
----                           -----
AllNodes                       {System.Collections.Hashtable}
NonNodeData


C:\> $EnvironmentData.AllNodes

Name                           Value
----                           -----
DefaultWebSitePath             C:\inetpub\wwwroot\ClickFire_Content
NodeName                       *
WebsiteName                    ClickFire
DestinationPath                C:\inetpub\wwwroot\ClickFire_Content
SourcePath                     \\DevBox\SiteContents\
  

 
Now, we have our configuration available to us as a PowerShell object (a hashtable) and the “AllNodes” section inside of it is also a hashtable. More accurately, the “AllNodes” section is an array of Hashtables because each node entry within “AllNodes” is a hashtable :

C:\> $EnvironmentData.AllNodes.GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Object[]                                 System.Array


C:\> $EnvironmentData.AllNodes | Get-Member | Select-Object TypeName -Unique

TypeName
--------
System.Collections.Hashtable
  

 
So now, what we need to do is to inject a new node entry for each node returned by our Active Directory query into the “AllNodes” section :

C:\> Foreach ( $DynamicNodeName in $DynamicNodeNames ) {
     $EnvironmentData.AllNodes += @{NodeName = $DynamicNodeName; Role = "WebServer"}
 }
  

 
For each node name, we add a new hashtable into “AllNodes”. These hashtables are pretty simple in this case, this is just to give our nodes a name and a role (in case we need to differentiate with other server types, like database servers for example).

The result of this updated configuration data is equivalent to :

@{
    # Node specific data
    AllNodes = @(

       # All the Web Servers have following information 
       @{
            NodeName           = '*'
            WebsiteName        = 'ClickFire'
            SourcePath         = '\\DevBox\SiteContents\'
            DestinationPath    = 'C:\inetpub\wwwroot\ClickFire_Content'
            DefaultWebSitePath = 'C:\inetpub\wwwroot\ClickFire_Content'
       }
       @{
            NodeName           = 'Web083'
            Role               = 'WebServer'
       }
       @{
            NodeName           = 'Web084'
            Role               = 'WebServer'
       }
       @{
            NodeName           = 'Web086'
            Role               = 'WebServer'
       }
    );
    NonNodeData = ''
}

 
So that’s it for the node data, but what if we need to add non-node data ?
It is very similar to the node data because the “NonNodeData” section of the configuration data is also a hashtable.

Let’s say we want to add a piece of XML data that may be used for the web.config file of our web servers to the “NonNodeData” section of the configuration data. We could do that in the configuration data file, right :

@{
    # Node specific data
    AllNodes = @(

       # All the Web Servers have following information 
       @{
            NodeName           = '*'
            WebsiteName        = 'ClickFire'
            SourcePath         = '\\DevBox\SiteContents\'
            DestinationPath    = 'C:\inetpub\wwwroot\ClickFire_Content'
            DefaultWebSitePath = 'C:\inetpub\wwwroot\ClickFire_Content'
       }
    );
    NonNodeData =
    @{
        DynamicConfig = [Xml](Get-Content -Path C:\Lab\SiteContents\web.config)
    }
}

Nope :

SafeGetValueErrorNew
 
This is because to safely import data from a file, the cmdlet Import-PowerShellDataFile kinda works in RestrictedLanguage mode. This means that executing cmdlets, or functions, or any type of command is not allowed in a data file. Even the XML type and a bunch of other things are not allowed in this mode. For more information : about_Language_Modes.

It does make sense : data files should contain data, not code.

OK, so we’ll do that from the DSC configuration script, then :

C:\> $DynamicConfig = [Xml](Get-Content -Path "\\DevBox\SiteContents\web.config")
C:\> $DynamicConfig

xml                            configuration
---                            -------------
version="1.0" encoding="UTF-8" configuration


C:\> $EnvironmentData.NonNodeData = @{DynamicConfig = $DynamicConfig}
C:\>
C:\> $EnvironmentData.NonNodeData.DynamicConfig.configuration


configSections      : configSections
managementOdata     : managementOdata
appSettings         : appSettings
system.web          : system.web
system.serviceModel : system.serviceModel
system.webServer    : system.webServer
runtime             : runtime
  

 
With this technique, we can put whatever we want in “NonNodeData”, even XML data, as long as it is wrapped in a hashtable. The last command shows that we can easily access this dynamic config data because it is stored as a tidy [Xml] PowerShell object.

Please note that the Active Directory query, the import of the configuration data and the manipulation of this data are all done in the same script as the DSC configuration but outside of the DSC configuration itself. That way, this modified configuration data can be passed to the DSC configuration as the value of its -ConfigurationData parameter.

Putting it all together, here is what the whole DSC configuration script looks like :

configuration Project_ClickFire
{
    Import-DscResource -Module PSDesiredStateConfiguration
    Import-DscResource -Module xWebAdministration
    
    Node $AllNodes.Where{$_.Role -eq "WebServer"}.NodeName
    {
        WindowsFeature IIS
        {
            Ensure          = "Present"
            Name            = "Web-Server"
        }
        File SiteContent
        {
            Ensure          = "Present"
            SourcePath      = $Node.SourcePath
            DestinationPath = $Node.DestinationPath
            Recurse         = $True
            Type            = "Directory"
            DependsOn       = "[WindowsFeature]IIS"
        }        
        xWebsite Project_ClickFire_WebSite
        {
            Ensure          = "Present"
            Name            = $Node.WebsiteName
            State           = "Started"
            PhysicalPath    = $Node.DestinationPath
            DependsOn       = "[File]SiteContent"
        }
    }
}

# Adding dynamic Node data
$EnvironmentData = Import-PowerShellDataFile -Path "$PSScriptRoot\..\EnvironmentData\Project_ClickFire.psd1"
$DynamicNodeNames = (Get-ADComputer -SearchBase "OU=Project ClickFire,OU=Servers,DC=Mat,DC=lab" -Filter {Name -Like "Web*"}).Name

Foreach ( $DynamicNodeName in $DynamicNodeNames ) {
    $EnvironmentData.AllNodes += @{NodeName = $DynamicNodeName; Role = "WebServer"}
}

# Adding dynamic non-Node data
$DynamicConfig = [Xml](Get-Content -Path "\\DevBox\SiteContents\web.config")
$EnvironmentData.NonNodeData = @{DynamicConfig = $DynamicConfig}

Project_ClickFire -ConfigurationData $EnvironmentData -OutputPath "C:\Lab\DSCConfigs\Project_ClickFire"
  

 
Running this script indeed generates a MOF file for each of our nodes, containing the same settings :

C:\> & C:\Lab\DSCConfigs\Project_ClickFire_Config.ps1

    Directory: C:\Lab\DSCConfigs\Project_ClickFire


Mode                LastWriteTime         Length Name                                       
----                -------------         ------ ----                                       
-a----         6/6/2016   1:37 PM           3986 Web083.mof                                 
-a----         6/6/2016   1:37 PM           3986 Web084.mof                                 
-a----         6/6/2016   1:37 PM           3986 Web086.mof        
  

 
Hopefully, this helps treating web servers really as cattle and give its full meaning to the expression “server farm“.