I have a file that contains multiples strings between parentheses that represents country's names.
This (USA) is a bad text (France)
with countries (Luxembourg) between () (Germany)
Whith multiple (Luxembourg) countries (USA) per line
and some lines without countries
To search (France) and find (Belgique) duplicate
countries (USA)
I want to extract all countries and display each country found on a new line.
What I'm expecting is following
USA
France
Luxembourg
Germany
Luxembourg
USA
France
Belgique
USA
Using a special tool named BS2EDT
editor on BS2000
Mainframe, the solution can be
list-string /(?<=\()[^)] (?=\))/,from=lettre.pays.txt
Using PowerShell
, what is the shorter solution ?
CodePudding user response:
Following tricky solutions is working on my PC
$regex = '(?<=\()[^)] (?=\))'
(Select-String .\lettre.pays.txt -Pattern $regex -AllMatches).Matches `
| Select-String -Pattern '.*' `
Get-Content
command read input file.
First Select-String
command find ALL strings using same Regex given in question.
.Matches
and second Select-String
command display strings found.
It is also possible to sort all countries found in adding SORT-OBJECT
command !
$regex = '(?<=\()[^)] (?=\))'
(Select-String .\lettre.pays.txt -Pattern $regex -AllMatches).Matches `
| Select-String -Pattern '.*' `
| Sort-Object
that display following result ...
Belgique
France
France
Germany
Luxembourg
Luxembourg
USA
USA
USA
CodePudding user response:
Using powershell you can accomplish it using Select-String
, definitely not shorter than what you already have:
Select-String .\lettre.pays.txt -Pattern '(?<=\()[^)] (?=\))' -AllMatches |
ForEach-Object { $_.Matches.Value }
As for your regex, I believe (?=\))
is not needed and could be removed from the pattern.