Home > other >  Python Regex Cycle for Extracting info
Python Regex Cycle for Extracting info

Time:06-24

I am trying to make a function to make an apply function that the end is to find the numbers followed by 3 characters in this case alc. The expected result should be = 54

import pandas as pd 
import regex as re 

numeros=[0,1,2,3,4,5,6,7,8,9]

i="sdASK23LJFASDFKJGHASDLKJF123HALSDKJFHASDF54 alcobas"
     
df=df.head(3)

def re_alcoba(i):
    i=i.replace(" ", "")
    patron_acoba=re.compile(r"alc")
    matches=patron_acoba.finditer(i)
    contador=1
    numero_alcobas=[]
    for match in matches:
        index=match.start()
    while contador < 3: 
        numero=i[index-contador]
        contador =1
        if numero in numeros: 
            numero_alcobas.insert(0,numero)
    respuesta="".join(numero_alcobas)
    return respuesta
            
            
respuesta=re_alcoba(i)

My Cicle wont work

CodePudding user response:

If you want numbers directly before alc then you don't need all this code but simply (\d )alc

import regex as re 

i = "sdASKLJFASDFKJGHASDLKJFHALSDKJFHASDF54alcobas"
i = i.replace(" ", "")
results = re.findall("(\d )alc", i)
print(results) # ['54']

i = "4asd5alc"
i = i.replace(" ", "")
results = re.findall("(\d )alc", i)
print(results) # ['5']

CodePudding user response:

As there are spaces in your example string, you can either match optional spaces except newlines after the digits:

(\d )[^\S\n]*alc

Regex demo

import re

pattern = r"(\d )[^\S\n]*alc"

s = ("sdASK23LJFASDFKJGHASDLKJF123HALSDKJFHASDF54 alcobas\n"
    "4asd5alc")
    
print(re.findall(pattern, s))

Output

['54', '5']
  • Related