This will be the function that is responsible for performing the replacements but only within the substrings that the regex must extract from the original input string
def detect_periods_between_hours(input_text_substring):
print(input_text_substring)
input_text_substring = input_text_substring.replace(' uno ', ' 1 ')
input_text_substring = input_text_substring.replace(' una ', ' 1 ')
input_text_substring = input_text_substring.replace(' dos ', ' 2 ')
input_text_substring = input_text_substring.replace(' tres ', ' 3 ')
input_text_substring = input_text_substring.replace(' cuatro ', ' 4 ')
input_text_substring = input_text_substring.replace(' cinco ', ' 5 ')
input_text_substring = input_text_substring.replace(' seis ', ' 6 ')
input_text_substring = input_text_substring.replace('en punto', '0')
input_text_substring = input_text_substring.replace('enpunto', '0')
input_text_substring = input_text_substring.replace(' y menos cuarto', ' 45')
input_text_substring = input_text_substring.replace('menos cuarto', '45')
input_text_substring = input_text_substring.replace(' y menoscuarto', ' 45')
input_text_substring = input_text_substring.replace('menoscuarto', '45')
input_text_substring = input_text_substring.replace(' y cuarto', ' 15')
#input_text_substring = input_text_substring.replace('cuarto', '15')
input_text_substring = input_text_substring.replace(' y media', ' 30')
input_text_substring = input_text_substring.replace('de la mañana', 'am')
input_text_substring = input_text_substring.replace('de la manana', 'am')
#the "y las" in between should be changed to this " -- "
input_text_substring = input_text_substring.replace(' y las ', ' -- ')
return input_text_substring #returns the corrected substring to replace in the original string
For example... this input string :
input_text = "Hay que estar alli entre las cuatro y las 6 de la mañana, aunque que con cuatro de esos bastaria y creo que bastaria entre la una y media y las dos ya que es aun temprano para mi, no es como ir a las 15: hs , a la 1 hs o a eso de entre las 15 : hs y las 16:10 pm hs"
This is the pattern whose objective is to detect within the string input_text
those substrings that match this pattern :
r"
(?:entre las|entre la)[\s|]
one or two numbers [\s|]* or (?:uno|una|dos|tres|cuatro|cinco|seis)[\s|]
(?:en punto|enpunto|y menos cuarto|menos cuarto|y menoscuarto|menoscuarto|y cuarto|y media|)
(?:h\. s\.|h s\.|h\. s|h s|h\.s\.|hs\.|h\.s|hs|horas|hora|)\s*
(?:en la|de la|por la|entrada la|entrado la|de el|en el|por el|entrada el|entrado el|del|)\s*
(?:madrugada|alba|amanecer|manana|mañana|medio-dia|mediodia|)[\s|]*
(?:y las|y la)[\s|]*
(?:entre las|entre la)[\s|]
one or two numbers [\s|]* or (?:uno|una|dos|tres|cuatro|cinco|seis)[\s|]
(?:en punto|enpunto|y menos cuarto|menos cuarto|y menoscuarto|menoscuarto|y cuarto|y media|)
(?:h\. s\.|h s\.|h\. s|h s|h\.s\.|hs\.|h\.s|hs|horas|hora|)\s*
(?:en la|de la|por la|entrada la|entrado la|de el|en el|por el|entrada el|entrado el|del|)\s*
(?:madrugada|alba|amanecer|manana|mañana|medio-dia|mediodia|)[\s|]*
"
... and then send them to the called function detect_periods_between_hours()
, then this function detect_periods_between_hours()
will return those corrected values and the objective is to replace them back in their original positions within the input_text
string
input_text = re.compile(r"(?:entre las|entre la)[\s|]*(\d{1,2})[\s|]*(?::|)[\s|]*(\d{0,2})[\s|]*(?:en punto|enpunto|y menos cuarto|menos cuarto|y menoscuarto|menoscuarto|y cuarto|y media|)[\s|]*(?:h\. s\.|h s\.|h\. s|h s|h\.s\.|hs\.|h\.s|hs|horas|hora|)\s*(?:en la|de la|por la|entrada la|entrado la|de el|en el|por el|entrada el|entrado el|del|)\s*(?:madrugada|alba|amanecer|manana|mañana|medio-dia|mediodia|)[\s|]*(?:y las|y la)[\s|]*(\d{1,2})[\s|]*(?::|)[\s|]*(\d{0,2})[\s|]*(?:en punto|enpunto|y menos cuarto|menos cuarto|y menoscuarto|menoscuarto|y cuarto|y media|)[\s|]*(?:h\. s\.|h s\.|h\. s|h s|h\.s\.|hs\.|h\.s|hs|horas|hora|)\s*(?:en la|de la|por la|entrada la|entrado la|de el|en el|por el|entrada el|entrado el|del|)\s*(?:madrugada|alba|amanecer|manana|mañana|medio-dia|mediodia|)").sub(detect_periods_between_hours, input_text)
print(repr(input_text)) #print the output, input after the modifications...
In this case these would be the substrings that the program should detect and extract...
"entre las cuatro y las 6 de la mañana"
"entre la una y media y las dos"
"entre las 15 : hs y las 16:10 pm hs"
Then these substrings must be sent in order to the detect_periods_between_hours()
function, and these would be corrected and returned in this way
"4: -- 6: am"
"1:30 -- 2:00"
"15: hs -- 16:10 pm hs"
And they should be replaced in the original string, getting this string which is the output I need:
input_text = "Hay que estar alli 4: -- 6: am, aunque que con cuatro de esos bastaria y creo que bastaria 1:30 -- 2:00 ya que es aun temprano para mi, no es como ir a las 15: hs , a la 1 hs o a eso de 15: hs -- 16:10 pm hs"
What things should I correct in my regex so that it can comply with the detection and extraction of the substrings, to then send them to the function, and then, once modified, replace them in the original string?
When I try run this code that send me this error message:
<re.Match object; span=(230, 260), match='entre las 15 : hs y las 16:10 '>
Traceback (most recent call last):
input_text_substring = input_text_substring.replace(' uno ', ' 1 ')
AttributeError: 're.Match' object has no attribute 'replace'
CodePudding user response:
The argument to the re.sub()
callback function is a Match
object, not a string. You need to use .group()
to get the text that was matched.
def detect_periods_between_hours(match):
input_text_substring = match.group()
print(input_text_substring)
# rest of the function here