I'm trying to extract only the parts I need from the table.
2555 texttext 0 100 100 0 0 0 0 lowness 0
2557 texttext 10 650 660 0 0 0 0 lowness 0
2564 texttext 0 30 30 0 0 0 0 lowness 0
2566 texttext 0 0 0 0 0 0 0 lowness 0
2567 texttext 10 70 80 0 0 0 0 lowness 0
All I need is 'text text' and/ immediately followed by two numbers and 'low' as shown below.
texttext 0 100 lowness
texttext 10 650 lowness
texttext 0 30 lowness
texttext 0 0 lowness
texttext 10 70 lowness
I tried this but failed.
text = """
2555 texttext 0 100 100 0 0 0 0 lowness 0
2557 texttext 10 650 660 0 0 0 0 lowness 0
2564 texttext 0 30 30 0 0 0 0 lowness 0
2566 texttext 0 0 0 0 0 0 0 lowness 0
2567 texttext 10 70 80 0 0 0 0 lowness 0
"""
for a in text.split('\n'):
if a == "":
continue
else:
print(a)
m = re.match('(^\D\d*\D)(\w*\s)(\d*\s)(\d*\s)(\d*\s\d*\s\d*\s\d*\s\d*\s)(\w )', a)
print(m)
print(m.group(2), m.group(3), m.group(4), m.group(6))
I tried to group by regex and get the parts, but I got the following error: Help / print(m.group(2), m.group(3), m.group(4), m.group(6)) AttributeError: 'NoneType' object has no attribute 'group'
CodePudding user response:
If you absolutely want to use a regular expression:
import re
text = """
2555 texttext 0 100 100 0 0 0 0 lowness 0
2557 texttext 10 650 660 0 0 0 0 lowness 0
2564 texttext 0 30 30 0 0 0 0 lowness 0
2566 texttext 0 0 0 0 0 0 0 lowness 0
2567 texttext 10 70 80 0 0 0 0 lowness 0
"""
pattern = re.compile(
r"\s*\d \s (\w )\s (\d )\s (\d )\s \d \s \d \s \d \s \d \s \d \s (\w )\s "
)
for line in text.strip().split('\n'):
match = re.search(pattern, line)
print(*match.groups())
Output:
texttext 0 100 lowness
texttext 10 650 lowness
texttext 0 30 lowness
texttext 0 0 lowness
texttext 10 70 lowness
But if it is really the case that it's always the same number of space-separated substrings of characters, then you might really be better off just splitting the lines by spaces:
for line in text.strip().split('\n'):
parts = line.split()
print(parts[1], parts[2], parts[3], parts[9])
Same output.
CodePudding user response:
You are not getting a match, because you are only matching a single \D
and a single \s
which match a single character.
But in the example data, there are more repetitions of the same characters to get to the next match.
If you fix that, you will get a match but with the wrong data in the groups, see https://regex101.com/r/v3ddai/1
Instead, you can just use 2 capture groups.
As there always seem to be digits present, you can change \d*
to \d
^\s*\d \s (\w \s \d \s \d \s )\d \s \d \s \d \s \d \s \d \s (\w )
CodePudding user response:
Try this:
for a in text.split('\n'):
if a == "":
continue
else:
parts = a.split()
print(parts[1],parts[2],parts[3],parts[9])
CodePudding user response:
for e in text.splitlines():
if e:
ls = e.split()
print(ls[1:4] ls[-2:-1])
['texttext', '0', '100', 'lowness']
['texttext', '10', '650', 'lowness']
['texttext', '0', '30', 'lowness']
['texttext', '0', '0', 'lowness']
['texttext', '10', '70', 'lowness']