I have a text file that has the following info:
"Where_can_i_find red capture state"
"Why_are_you orange 00:AO state"
"Salty_pepper gray good state"
with open(cur_path,'r') as file:
data = file.read()
itm1 = re.search('Where_can_i_find (. ?)state',data.group(1)
itm2 = re.search('Salty_pepper (. ?)state',data.group(1)
This will give me red capture
and gray good
etc...But I only want to get capture
for the first item and good
for the second item without the red
and gray
part. In other words I want to skip everything on the 2nd column.
How should I change my regex for this to work?
CodePudding user response:
You can use
for line in file:
if line.strip().endswith('state') and any(line.strip().startswith(x) for x in ['Where_can_i_find','Salty_pepper']):
print(line.split()[-2])
See the Python demo.
Notes:
line.strip().endswith('state')
- checks if the line ends withstate
any(line.strip().startswith(x) for x in ['Where_can_i_find','Salty_pepper'])
- checks if the line starts with one of the specified strings.
CodePudding user response:
The \s
(any whitespace) and \S
(non-whitespace) classes are useful here.
To match a single non-whitespace sequence \S
, separated with a single whitespace sequence \s
right before state
:
re.search(r'Where_can_i_find.*?(\S )\s state',data).group(1)
re.search(r'Salty_pepper.*?(\S )\s state',data).group(1)
Since you mention 'columns', another approach would be to split the whole thing into columns first, and then select the right items. For instance:
data = '''Where_can_i_find red capture state
Why_are_you orange 00:AO state
Salty_pepper gray good state'''
data_split = [line.split() for line in data.splitlines()]
data_dict = {line[0]: line[2] for line in data_split}
> data_dict
{'Where_can_i_find': 'capture',
'Why_are_you': '00:AO',
'Salty_pepper': 'good'}
Since this avoids regexes, it can be a lot faster (perhaps depending on how many of the lines you actually want to access).
CodePudding user response:
If all your lines end with state, then you can use:(\S )(?=\s state)
Test here: https://regex101.com/r/HQVXRe/1
Since you want to get the second column of specific lines, you can use startswith
to find such lines and the use re.search
.
import re
s = '''"Where_can_i_find red capture state"
"Why_are_you orange 00:AO state"
"Salty_pepper gray good state"'''
lines = s.split('\n')
pattern = re.compile(r'(\S )(?=\s state)', re.M)
prefixes = ('"Where_can_i_find','"Salty_pepper')
for line in lines:
if(line.startswith(prefixes)):
print(pattern.findall(line))
# capture
# good