Home > OS >  PowerShell, regex to extract 3-digit codes from a string
PowerShell, regex to extract 3-digit codes from a string

Time:05-03

I have a about 50 files that have a few variations:

Report View, <some text here>, POR-en.csv
Report View, <some text here>, AUS-GOG-en.csv
Report View, <some text here>, GBR-FEQ-en-gb.csv

I am only interested in the regional codes, and to discard all other text (both -en and -en-gb etc) such that I can rename the above files as:

POR.csv
AUS-GOG.csv
GBR-FEQ.csv

I started by using -split "-en\.csv" then splitting again ($result -split " ")[-2], and that sort of worked ok until I realised that some reports have en-gb so I realise that I need a more general regex rule to extract any combination of 3x uppercase characters, and optionally a space and 3 more uppercase characters, and then rename the file that way, but I'm a bit rusty on regex and haven't found a good result from google searches. Can a regex expert show me how to achieve this in PowerShell please?

CodePudding user response:

Assuming:

  • There always a regional code with 3x uppercase and after that a '-' character.

The first capturing group of:

([A-Z-] )(?=\-)

Should do the job. Captures everything that has uppercase between A-Z and simbol '-' (if exists) at least 1 or more times (regex101). The positive lookahead matches the character - without consuming characters.

CodePudding user response:

You can use:

([A-Z]{3})(-[A-Z]{3})?

to match two groups of 3 uppercase characters. The second group is optional as is the hyphen. The ? makes the preceding mark optional , in this case it is a group so everything in group is optional. https://regex101.com/r/9xb0WE/1

Alternatively if the part you want is always uppercase and you always want the CSV you could do a right to left match with the closing line anchor:

[^A-Z] [.]csv$

then just swap that with:

.csv

https://regex101.com/r/TtLwKx/1

CodePudding user response:

Try this:

(?<=[, ])([A-Z]{3})

The first group matches the , and space. The second group will match the country code ASSUMING it's always 3 letters and UPPER CASE (you can relax this assumption by matching to the dash))

  • Related