I have printouts with hundreds of lines, some containing stock symbols in CAPS that I'd like to extract, e.g.
STOCKS OPTIONS SYMBOL GROUPS WORKING
$14,489.60
$14,489.60 Mark WMT D
72%
($24.00)
$45.00 ($153.00) T
2 opt
$500.00 MSFT
100 Sha
I'd like to extract: WMT T MSFT
using online regex testers such as https://regexr.com/
I spent hours trying expressions such as the following, but no luck yet to just extract just the symbols and none of the other text
$. [A-Z]\w\s
CodePudding user response:
You didn't specify a programming language so I'll assume PCRE:
regex
^.*\d .*?\K\b[A-Z] \b
data
STOCKS OPTIONS SYMBOL GROUPS WORKING
$14,489.60
$14,489.60 Mark WMT D
72%
($24.00)
$45.00 ($153.00) T
2 opt
$500.00 MSFT
100 Sha
The extracted data is WMT, T, and MSFT
https://regex101.com/r/N2shwC/1
In English:
Find every line with digits and capture the first sequence of all capital letters surrounded by word boundaries.