regex get specific lines from multiline text-CodePudding

i have a text:

address:

123 beautiful house

---blabla---

another street

---more blabla---

---extra bla---

one country

---a lot blabla

i used :

s='address:\n 123 beautiful house \n ---blabla--- \n another street \n ---more blabla--- \n ---extra bla--- \n one country\n---a lot blabla'
value = re.search('address:\n((?:.*\n){1,6})',s)
print(value.group(1))

how can i get only line number 1,3,6:

123 beautiful house

another street

one country

thanks in advance.

CodePudding user response：

There is no way to skip portions of string in the middle of a match when using a limiting quantifier.

You might want to match and capture the lines sequentially, like

address:\n(.*)\n.*\n(.*)(?:\n.*){2}\n(.*)

See this regex demo.

However, the most straight-forward solution is to capture the lines after the word of your choice and then split the Group 1 value into separate lines and get them if present:

import re
s='address:\n 123 beautiful house \n ---blabla--- \n another street \n ---more blabla--- \n ---extra bla--- \n one country\n---a lot blabla'
value = re.search('address:\n((?:.*\n){1,6})',s)
if value:
    lines = value.group(1).splitlines()
    print(lines[0].strip())
    if len(lines) > 2:
        print(lines[2].strip())
    if len(lines) > 5:
        print(lines[5].strip())

See this Python demo.

CodePudding user response：

You don't need a regex just to extract specific lines

text = """\
address:
123 beautiful house
---blabla---
another street
---more blabla---
---extra bla---
one country
---a lot blabla
"""

need_lines = [1, 3, 6]
address = [
    line
    for n, line in enumerate(text.splitlines())
    if n in need_lines
]

print(address) 
# ['123 beautiful house', 'another street', 'one country']