Home > Blockchain >  Regex find keyword followed by N characters
Regex find keyword followed by N characters

Time:06-09

I have a df column with URL having keyword with hash values, example

/someurl/env40d929fadbe746ecagjbf6c515d30686/end
/some/other/url/envlabel40d929fadbe746ecagjbf6c517t30686/envvar40d929fadbe746ecagjbf6c515d306r6

Goal is to replace words env.following.32.char.hash into {env}, and similarly envlabel.following.32.char.hash into {envlabel} and similarly others.

I am trying to use regex in replace method,

to_replace_key = ['env', 'envlabel', 'envvar']

for word in to_replace_key:
    df['URL'] = df['URL'].str.replace(f"{word}\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w", f'{{{word}Id}}', regex=True)

Challenges:

  1. env replaces envlabels
  2. keyword.following.32.char.hash is located as a substring between / char or at the end of line

Expected output:

/someurl/{env}/user
/some/other/url/{envlabel}/{envvar}

Thanks !!

CodePudding user response:

You can use the regex '(env(label|var)?)\w{32}' which simply captures env and label if it is present. ie ? ensures that label is captured if present. Replace the matched string with the first captured group. ie \\1 within the curly braces.

  df['URL'].str.replace('(env(label|var)?)\w{32}', '{\\1}', regex=True)

0                     /someurl/{env}/end
1    /some/other/url/{envlabel}/{envvar}

CodePudding user response:

Don't loop, use:

df['URL'] = df['URL'].str.replace('(envlabel|envvar|env)\w{32}', r'{\1}', regex=True)

output:

                                   URL
0                   /someurl/{env}/end
1  /some/other/url/{envlabel}/{envvar}

regex demo

  • Related