I have a about 50 files that have a few variations:
Report View, <some text here>, POR-en.csv
Report View, <some text here>, AUS-GOG-en.csv
Report View, <some text here>, GBR-FEQ-en-gb.csv
I am only interested in the regional codes, and to discard all other text (both -en
and -en-gb
etc) such that I can rename the above files as:
POR.csv
AUS-GOG.csv
GBR-FEQ.csv
I started by using -split "-en\.csv"
then splitting again ($result -split " ")[-2]
, and that sort of worked ok until I realised that some reports have en-gb
so I realise that I need a more general regex rule to extract any combination of 3x uppercase characters, and optionally a space and 3 more uppercase characters, and then rename the file that way, but I'm a bit rusty on regex and haven't found a good result from google searches. Can a regex expert show me how to achieve this in PowerShell please?
CodePudding user response:
Assuming:
- There always a regional code with 3x uppercase and after that a '-' character.
The first capturing group of:
([A-Z-] )(?=\-)
Should do the job. Captures everything that has uppercase between A-Z and simbol '-' (if exists) at least 1 or more times (regex101). The positive lookahead matches the character - without consuming characters.
CodePudding user response:
You can use:
([A-Z]{3})(-[A-Z]{3})?
to match two groups of 3 uppercase characters. The second group is optional as is the hyphen. The ?
makes the preceding mark optional , in this case it is a group so everything in group is optional. https://regex101.com/r/9xb0WE/1
Alternatively if the part you want is always uppercase and you always want the CSV you could do a right to left match with the closing line anchor:
[^A-Z] [.]csv$
then just swap that with:
.csv
https://regex101.com/r/TtLwKx/1
CodePudding user response:
Try this:
(?<=[, ])([A-Z]{3})
The first group matches the , and space. The second group will match the country code ASSUMING it's always 3 letters and UPPER CASE (you can relax this assumption by matching to the dash))