Optimize PowerShell – Getting and Filtering Data

As I was writing my previous post on optimizing Powershell, I thought of other tips I have had to use to speed up scripts in relation to getting data into PowerShell. Like before, I will start with a summary of recommendations and move onto details.

Summary

  • Silence your scripts. Any text printed to the console comes with severe time overhead. If you need progress updates, make sure you use Write-Progress over Write-Host.
  • If you are looking up data in an array at random, then turn your array into a hash-table instead.
  • When querying a server or system for data, try pulling all the data you need all at once, instead of one at a time. This can speed up your scripts, even if you pull more data that you actually need. This recommendation does heavily depend on the system you are querying and how much extraneous data you get back.

Console Output

A quick side note here. Outputting text to the console is very slow. You can speed up some commands, by silencing the output of the command. You have a few different ways of doing this. Lets look at them.

NameMethodTime (MS) per 5000 iterations
Piping to Out-Null$I | Out-Null107.8185
Saving out to $null$null = $i9.9016

So if you need to silence something, saving the output to a variable or $null is far faster than piping to Out-Null. Now that we know the faster method of silencing a command, lets see just how slow printing to the console is.

NameMethodTime (MS) per 5000 iterations
Print CommandWrite-Host $I4434.8804
Silenced$null = $I9.9016

That is around 500 times faster. So if you need speed, consider removing unneeded Write-Host commands, or silencing functions by saving their output to $null. Some good news though, Write-Progress is fairly safe to use.

NameMethodTime (MS) per 5000 iterations
Write ProgressWrite-Progress642.605

So use Write-Progress over Write-Host if you need progress updates. The script used to pull these metrics:

Converting an Array to a Hashtable

PowerShell often returns data in Arrays. These arrays are not very fast to query for a single item however. This does not matter for small arrays, or if you will iterate through each item in random order, however if you need to pull a single item out of the array based on one of it’s properties, it can be slow unless you do something to index the data.

The most common method I use, is I turn the array into a hash-table. This only works if the property you are looking each object up with is unique to the array.

I will not focus on the speed metrics here, I already covered hash-table metrics in my last post. I want to show you -how- to convert an array into a hash-table.

First, you need to choose a property that you will query the data on. This is more often than not the object name. If you are querying users from AD, this could be the sAMAccountName or something similar. The only restriction, is that for every object in the array, this property must be unique!

$AllADUsers = Get-ADUser
$UserHashTable = @{}
foreach($User in $AllADUsers) {
    $UserHashTable.Add($User.Name,$User)
}

That’s it. We now have an indexed hash-table of our array. To look up users from now on, we would use:

$MyUser = $UserHashTable["John Doe"]

A more extensive example is included in the script below.

Sorted Array and Binary Search

Another tool you can use to speed up searches, is to sort your arrays and use BinarySearch. This does not really work on generic arrays, so if you go this route, make sure you use strongly typed arrays. This also works best on arrays of core data types (Int, string, float, etc), instead of complex objects. If you need to search an array of complex objects based on one of their properties, consider hash-tables instead. Otherwise, you would need to create your own IComparable class…

Lets see how to create, sort, and search these arrays. To create the array, use the .Net method of creating them. In these examples, I will use a string array.

$ItemArray = [string[]]::new($ItemCount)

Next fill in your array with your data. Then, call the sort method on your array. This sort method would be where you enter your custom IComparable object. IComparable objects already exist by default for the core data types, so it is not needed for a string array.

[Array]::Sort($ItemArray)

Finally, call the BinarySearch() function when search for an item in the array, or for the existence of an item in the array. Instead of $ItemArray.Contains() use:

([Array]::BinarySearch($ItemArray, $ItemToFind) >=0)

And instead of $ItemArray.IndexOf($ItemToFind) use:

$ItemIndex = [Array]::BinarySearch($ItemArray, $ItemToFind)

I cover the speed metrics of this in my last post. For a more complete example, see the following script:

Getting Data

If you are querying alot of data, this is likely a bottleneck in your script, there are some ways you can speed this up however. In general, pulling all your data at once is faster than pulling individual objects one at a time. This applies to many commands but I can attest to Get-ADUser and Get-Item/Get-ChildItem. To the metrics!

NameMethodTime (MS) per 500 iterations on 100 files
Get files 1 at a timeGet-Item -Path <FilePath>11227.0608
Get all filesGet-ChildItem -Path <FolderPath>1148.6165
Get all file namesGet-ChildItem -Path <FolderPath> -Name664.0743
Get all files by wildcardGet-Item -Path “<FolderPath>\*”4060.0878

We can see that pulling all files is faster than pulling them one at a time. Also, if you only need the file names, then adding -Name to Get-ChildItem is faster than having PowerShell grab all file info.

This does not tell the full story. What about filtering it? When we pull one at a time, we have the one file that we need, but if we pull all of them, then we need to search our array and that adds time. But how much? Not a lot if you create a hash-table first!

NameMethodTime (MS) per 500 iterations on 100 files
Get files 1 at a timeGet-Item -Path <FilePath>11227.0608
Pull all files into a hash-table and query$ResultArray = Get-ChildItem -Path $MetricFolder.FullName
$ResultHashTable = @{}
foreach ($File in $ResultArray) {
$ResultHashTable.Add($File.FullName, $File)
}
for($I=0;$I -lt $Files; $I++) {
$null = $ResultHashTable[(Join-Path -Path $MetricFolder.FullName -ChildPath “$I.txt”)]
}
3708.0087

This is so much faster, that even if you pull twice as many files as you need, it is still faster than pulling the files one at a time! In this next example, I doubled the files in the directory, but still only query for 100.

NameMethodTime (MS) per 500 iterations on 100 out of 200 files
Pull all files into a hash-table and query$ResultArray = Get-ChildItem -Path $MetricFolder.FullName
$ResultHashTable = @{}
foreach ($File in $ResultArray) {
$ResultHashTable.Add($File.FullName, $File)
}
for($I=0;$I -lt $Files; $I++) {
$null = $ResultHashTable[(Join-Path -Path $MetricFolder.FullName -ChildPath “$I.txt”)]
}
5016.6214

So even if we pull twice as many files into the hashtable than we need to query, it is still twice as fast as pulling the files one at a time! Note that when creating the hashtable, I am using the .Add() function. This is far faster than the $HashTable+=@{Key=Value} per my previous post. For the script I used to pull these metrics, see: