Optimize PowerShell – Getting and Filtering Data

As I was writing my previous post on optimizing Powershell, I thought of other tips I have had to use to speed up scripts in relation to getting data into PowerShell. Like before, I will start with a summary of recommendations and move onto details.

Summary

  • Silence your scripts. Any text printed to the console comes with severe time overhead. If you need progress updates, make sure you use Write-Progress over Write-Host.
  • If you are looking up data in an array at random, then turn your array into a hash-table instead.
  • When querying a server or system for data, try pulling all the data you need all at once, instead of one at a time. This can speed up your scripts, even if you pull more data that you actually need. This recommendation does heavily depend on the system you are querying and how much extraneous data you get back.

Console Output

A quick side note here. Outputting text to the console is very slow. You can speed up some commands, by silencing the output of the command. You have a few different ways of doing this. Lets look at them.

NameMethodTime (MS) per 5000 iterations
Piping to Out-Null$I | Out-Null107.8185
Saving out to $null$null = $i9.9016

So if you need to silence something, saving the output to a variable or $null is far faster than piping to Out-Null. Now that we know the faster method of silencing a command, lets see just how slow printing to the console is.

NameMethodTime (MS) per 5000 iterations
Print CommandWrite-Host $I4434.8804
Silenced$null = $I9.9016

That is around 500 times faster. So if you need speed, consider removing unneeded Write-Host commands, or silencing functions by saving their output to $null. Some good news though, Write-Progress is fairly safe to use.

NameMethodTime (MS) per 5000 iterations
Write ProgressWrite-Progress642.605

So use Write-Progress over Write-Host if you need progress updates. The script used to pull these metrics:

Converting an Array to a Hashtable

PowerShell often returns data in Arrays. These arrays are not very fast to query for a single item however. This does not matter for small arrays, or if you will iterate through each item in random order, however if you need to pull a single item out of the array based on one of it’s properties, it can be slow unless you do something to index the data.

The most common method I use, is I turn the array into a hash-table. This only works if the property you are looking each object up with is unique to the array.

I will not focus on the speed metrics here, I already covered hash-table metrics in my last post. I want to show you -how- to convert an array into a hash-table.

First, you need to choose a property that you will query the data on. This is more often than not the object name. If you are querying users from AD, this could be the sAMAccountName or something similar. The only restriction, is that for every object in the array, this property must be unique!

$AllADUsers = Get-ADUser
$UserHashTable = @{}
foreach($User in $AllADUsers) {
    $UserHashTable.Add($User.Name,$User)
}

That’s it. We now have an indexed hash-table of our array. To look up users from now on, we would use:

$MyUser = $UserHashTable["John Doe"]

A more extensive example is included in the script below.

Sorted Array and Binary Search

Another tool you can use to speed up searches, is to sort your arrays and use BinarySearch. This does not really work on generic arrays, so if you go this route, make sure you use strongly typed arrays. This also works best on arrays of core data types (Int, string, float, etc), instead of complex objects. If you need to search an array of complex objects based on one of their properties, consider hash-tables instead. Otherwise, you would need to create your own IComparable class…

Lets see how to create, sort, and search these arrays. To create the array, use the .Net method of creating them. In these examples, I will use a string array.

$ItemArray = [string[]]::new($ItemCount)

Next fill in your array with your data. Then, call the sort method on your array. This sort method would be where you enter your custom IComparable object. IComparable objects already exist by default for the core data types, so it is not needed for a string array.

[Array]::Sort($ItemArray)

Finally, call the BinarySearch() function when search for an item in the array, or for the existence of an item in the array. Instead of $ItemArray.Contains() use:

([Array]::BinarySearch($ItemArray, $ItemToFind) >=0)

And instead of $ItemArray.IndexOf($ItemToFind) use:

$ItemIndex = [Array]::BinarySearch($ItemArray, $ItemToFind)

I cover the speed metrics of this in my last post. For a more complete example, see the following script:

Getting Data

If you are querying alot of data, this is likely a bottleneck in your script, there are some ways you can speed this up however. In general, pulling all your data at once is faster than pulling individual objects one at a time. This applies to many commands but I can attest to Get-ADUser and Get-Item/Get-ChildItem. To the metrics!

NameMethodTime (MS) per 500 iterations on 100 files
Get files 1 at a timeGet-Item -Path <FilePath>11227.0608
Get all filesGet-ChildItem -Path <FolderPath>1148.6165
Get all file namesGet-ChildItem -Path <FolderPath> -Name664.0743
Get all files by wildcardGet-Item -Path “<FolderPath>\*”4060.0878

We can see that pulling all files is faster than pulling them one at a time. Also, if you only need the file names, then adding -Name to Get-ChildItem is faster than having PowerShell grab all file info.

This does not tell the full story. What about filtering it? When we pull one at a time, we have the one file that we need, but if we pull all of them, then we need to search our array and that adds time. But how much? Not a lot if you create a hash-table first!

NameMethodTime (MS) per 500 iterations on 100 files
Get files 1 at a timeGet-Item -Path <FilePath>11227.0608
Pull all files into a hash-table and query$ResultArray = Get-ChildItem -Path $MetricFolder.FullName
$ResultHashTable = @{}
foreach ($File in $ResultArray) {
$ResultHashTable.Add($File.FullName, $File)
}
for($I=0;$I -lt $Files; $I++) {
$null = $ResultHashTable[(Join-Path -Path $MetricFolder.FullName -ChildPath “$I.txt”)]
}
3708.0087

This is so much faster, that even if you pull twice as many files as you need, it is still faster than pulling the files one at a time! In this next example, I doubled the files in the directory, but still only query for 100.

NameMethodTime (MS) per 500 iterations on 100 out of 200 files
Pull all files into a hash-table and query$ResultArray = Get-ChildItem -Path $MetricFolder.FullName
$ResultHashTable = @{}
foreach ($File in $ResultArray) {
$ResultHashTable.Add($File.FullName, $File)
}
for($I=0;$I -lt $Files; $I++) {
$null = $ResultHashTable[(Join-Path -Path $MetricFolder.FullName -ChildPath “$I.txt”)]
}
5016.6214

So even if we pull twice as many files into the hashtable than we need to query, it is still twice as fast as pulling the files one at a time! Note that when creating the hashtable, I am using the .Add() function. This is far faster than the $HashTable+=@{Key=Value} per my previous post. For the script I used to pull these metrics, see:

Optimize PowerShell – Arrays and Loops

Sometimes, PowerShell is slow, especially when you are dealing with a large amount of data, but there are ways of speeding things up depending on what you are doing. This post will focus on how to speed up loops, arrays and hash-tables. All metrics were gathered in Windows 10 1909 PSVersion 5.1. Lets start with the summary.

Summary

  • Pre-initialize your arrays if possible. Instead of adding things to your array one at a time, if you know how long your array needs to be, create it at that length and then fill it.
    • If you do not know what length the array will be, create a list instead. Adding objects to lists is far faster than adding to an array.
  • If you need to do random lookups on a set of data, consider sorting your Array/List and then call BinarySearch()
    • Avoid searching by piping an array to Where-Object, either turn it into a hashtable, or sort the array and use BinarySearch()
  • When adding items to a hash-table, use the Add() function
  • When looping through objects, consider using a normal foreach(){}

Arrays and Lists

Now for the actual metrics. You can find the scripts used under each section. For this section, we’ll look at arrays/lists. First, creating and filling.

Most methods of creating and filling arrays are fairly similar. The only noticeable slowdown is if you use PowerShell’s native array, and do not pre-initialize it. This is because on the back-end, whenever you add to the array, the computer effectively re-creates the entire array with each add.

NameMethodTime (MS) per 10000 iterations
Native Array$PSArray = @(); $PSArray += $i;1738.6903
Initialized Native Array$PSArray = @(0)*$Iterations; $PSArray[$i]=$i;28.6651
Initialized .Net Array$PSArray = [int[]]::new($Iterations); $PSArray[$i]=$i;26.2101
.Net List$PSArray = [System.Collections.Generic.List[int]]::new();
$PSArray.Add($I);
22.6374

Now on to read performance. In this test, I am just using a simple .Contains() check. While the performance does vary depending on the type of array, we are sub 1-second for 10000 iterations. This is not noticeable to humans. The only noticeable difference is if you pipe your array to Where-Object for searching. That took 11 minutes! If you really need speed though, sorting your array and using BinarySearch is the way to go.

NameMethodTime (MS) per 10000 iterations
Native Array Contains$PSArray.Contains($i)177.2726
.Net Array Contains$PSArray.Contains($i)38.632
.Net List Contains$PSArray.Contains($i)87.9633
.Net List BinarySearch$PSArray.BinarySearch($i)23.1007
Native Array with Pipe Filtering$PSArray | Where-Object {$_ -eq $i}680831.936

Lets look take a look at hash-tables. Hash-tables are useful as they allow you to assign a key to an object, and then query that quickly at a later time. When adding to hash-tables though, the computer has to make sure the key being added is unique to the hash-table. This has a noticeably negative effect when using the Native PowerShell hash-table. At 22 seconds for 10000 items added to the hashtable, this is still do-able for most scripts. That said, if you add more items to it, it just keeps getting slower. A quick and easy change is to use the Add() function instead of the $HashTable += @{} pattern. If you do that, then there is no real performance difference between the native PowerShell hashtable and a .Net Dictionary.

NameMethodTime (MS) per 10000 iterations
Native Hashtable$PSArray = @{};
$PSArray += @{$I.ToString()=$I};
22859.229
.Net Dictionary$PSArray = [System.Collections.Generic.Dictionary[string,int]]::new();
$PSArray.Add($I.ToString(), $I);
30.5428
Native Hashtable with Add Function$PSArray = @{};
$PSArray.Add($I.ToString(),$I);
32.5752

To exemplify how slow hastables can get the more items you add, I charted it out.

Well, that is not super useful is it. All it shows is using the $HashTable += @{} is so slow, the other methods don’t even register. Lets look at that in log10 scale.

Definently make sure you use the .Add() function for any large hash-table!

For reading hash-tables, I just checked how quickly keys could be searched. Both .Net and the native method of creating hash-tables were suitably fast.

NameMethodTime (MS) per 10000 iterations
Native Hashtable Contains$PSArray.ContainsKey($i)31.041
.Net Hashtable Contains$PSArray.ContainsKey($I)21.0664

Scripts used for metrics gathering and the Excel sheet used to create charts.

Loops

Finally lets look at loop performance. If you need to perform some action on every item in a collection, you have several options. It would take large arrays to notice much of a difference in which method you use, but in my tests, using a foreach(){} loop outperformed all other methods and piping a collection to foreach-object {} had the worse performance.

NameMethodTime (MS) for 100000 iterations
For Loopfor ($i =0;$i-lt $iterations;$i++){…}142.6532
Foreach Loopforeach ($item in $myarray) {…}62.4789
Piping to foreach-object$myarray | foreach-object {…}389.7384
.ForEach function$myarray.ForEach{…}160.2419

Scripts used to pull these metrics