I have a large txt file and need to remove irrelevant text and numbers greater than or within a range.
I have the following example txt file:
2022-09-27 00:00:01 All bus routes Local route 1
2022-09-27 00:10:01 All bus routes Local route 2
2022-09-27 00:00:01 All bus routes Local route 16
2022-09-27 15:00:01 All bus routes Local route 58
2022-10-07 00:00:01 All bus routes Local route 1
2022-10-17 00:10:01 All bus routes Local route 2
2022-09-27 00:00:01 All bus routes Local route 16
2022-09-27 15:00:01 All bus routes Local route 99
2022-11-14 00:00:01 All bus routes Local route 1
2022-09-27 00:10:01 All bus routes Local route 2
2022-09-27 00:00:01 All bus routes Local route 16
2022-09-27 15:00:01 All bus routes Local route 62248
2022-09-27 00:00:01 All bus routes Local route 1
2022-09-27 00:10:01 All bus routes Local route 222
2022-09-27 00:00:01 All bus routes Local route 16
2022-09-27 15:00:01 All bus routes Local route 58
What I am trying to do is remove all text before the word Local and all routes greater than 90.
Get-Content C:\Temp\Buses.txt
-replace "Local [-1]", ""
-replace "90 [ 1]", ""
Set-Content C:\Temp\Buses1.txt
The above obviously does not work, what am I doing wrong?
The aim is to also remove duplicate lines and those that are not applicable to the results.
CodePudding user response:
This example uses regex to only keep the text 'Local route xx' and filter out routes >90:
$Routes = Foreach ($line in (Get-Content 'C:\Temp\Buses.txt')) {
if ($line -match 'Local route (\d{1,2})$') { ## only one or two-digit routes
if ($matches[1] -lt 90) { ## only routes <90
$matches[0] ## output matched text
}
}
}
# view output with no duplicates
$Routes | Select -Unique
Local route 1
Local route 2
Local route 16
Local route 58 ## etc...
# Save to file with:
$Routes | Select -Unique | Set-Content 'C:\Temp\Buses1.txt'