import re
x = """44
5844 44554 Hi hi! , sahhashash; asakjas. jjksakjaskjas.
ooooooppkkk"""
#both initial after the last line break that they have within their capture range
# ((?:\w )?) ---> with a capturing group this pattern can capture a substring of alphanumeric characters (uppercase and lowercase) until it is taken with a space, a comma or a dot
# ((?:\w\s*) ) ---> this pattern is similar to the previous one but it does not stop when finding spaces
regex_patron_m1 = r"\s*((?:\w )?) \s*\¿?(?:del |de |)\s*((?:\w\s*) )\s*\??"
m1 = re.search(regex_patron_m1, x, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code
if m1:
word, association = m1.groups()
print(repr(word)) #print captured substring by first capture group
print(repr(association)) #print captured substring by second capture group
The output that I get with this two patterns
'5844'
'44554 Hi hi'
What should I modify to get the following? since I don't understand why both capture groups start their capture after the newline
And what should I do so that the capture of the second capture group is up to the full stop point ".[\s|]*\n*"
or ".\n*"
? To get
'44'
'5844 44554 Hi hi! , sahhashash; asakjas. jjksakjaskjas.'
And if I didn't want it to stop at the line break, to get something like this, what should I do?
'44'
'5844 44554 Hi hi! , sahhashash; asakjas. jjksakjaskjas.
ooooooppkkk'
CodePudding user response:
try this expression:
((\w (\r|\n).*.)\n\w*)
group 1:
44
5844 44554 Hi hi! , sahhashash; asakjas. jjksakjaskjas.
ooooooppkkk
group 2:
44
5844 44554 Hi hi! , sahhashash; asakjas. jjksakjaskjas.
hope this is what you were looking for.
CodePudding user response:
Create a string containing line breaks
Newline code \n(LF), \r\n(CR LF)
Triple quote ''' or """
With indent
Concatenate a list of strings on new lines
Split a string into a list by line breaks: splitlines()
Remove or replace line breaks
Output with print() without a trailing newline