How to capture string of characters from where it is indicated to the first point followed by a line-CodePudding

import re

x = """44
5844 44554  Hi hi!   , sahhashash; asakjas. jjksakjaskjas.
ooooooppkkk"""

#both initial after the last line break that they have within their capture range
# ((?:\w )?)   ---> with a capturing group this pattern can capture a substring of alphanumeric characters (uppercase and lowercase) until it is taken with a space, a comma or a dot
# ((?:\w\s*) )   ---> this pattern is similar to the previous one but it does not stop when finding spaces
regex_patron_m1 = r"\s*((?:\w )?) \s*\¿?(?:del |de |)\s*((?:\w\s*) )\s*\??"

m1 = re.search(regex_patron_m1, x, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code

if m1:
    word, association = m1.groups()
    
    print(repr(word)) #print captured substring by first capture group
    print(repr(association)) #print captured substring by second capture group

The output that I get with this two patterns

'5844'
'44554  Hi hi'

What should I modify to get the following? since I don't understand why both capture groups start their capture after the newline

And what should I do so that the capture of the second capture group is up to the full stop point ".[\s|]*\n*" or ".\n*"? To get

'44'
'5844 44554  Hi hi!   , sahhashash; asakjas. jjksakjaskjas.'

And if I didn't want it to stop at the line break, to get something like this, what should I do?

'44'
'5844 44554  Hi hi!   , sahhashash; asakjas. jjksakjaskjas.
ooooooppkkk'

CodePudding user response：

try this expression:

((\w (\r|\n).*.)\n\w*)

group 1:

44
5844 44554  Hi hi!   , sahhashash; asakjas. jjksakjaskjas.
ooooooppkkk

group 2:

44
5844 44554  Hi hi!   , sahhashash; asakjas. jjksakjaskjas.

hope this is what you were looking for.

CodePudding user response：

Create a string containing line breaks

Newline code \n（LF）, \r\n（CR   LF）

Triple quote ''' or """

With indent

Concatenate a list of strings on new lines

Split a string into a list by line breaks: splitlines()

Remove or replace line breaks

Output with print() without a trailing newline