I have a df column with URL having keyword with hash values, example
/someurl/env40d929fadbe746ecagjbf6c515d30686/end
/some/other/url/envlabel40d929fadbe746ecagjbf6c517t30686/envvar40d929fadbe746ecagjbf6c515d306r6
Goal is to replace words env.following.32.char.hash
into {env}
, and similarly envlabel.following.32.char.hash
into {envlabel}
and similarly others.
I am trying to use regex in replace method,
to_replace_key = ['env', 'envlabel', 'envvar']
for word in to_replace_key:
df['URL'] = df['URL'].str.replace(f"{word}\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w\w", f'{{{word}Id}}', regex=True)
Challenges:
env
replacesenvlabels
keyword.following.32.char.hash
is located as a substring between/
char or at the end of line
Expected output:
/someurl/{env}/user
/some/other/url/{envlabel}/{envvar}
Thanks !!
CodePudding user response:
You can use the regex '(env(label|var)?)\w{32}'
which simply captures env
and label
if it is present. ie ?
ensures that label
is captured if present. Replace the matched string with the first captured group. ie \\1
within the curly braces.
df['URL'].str.replace('(env(label|var)?)\w{32}', '{\\1}', regex=True)
0 /someurl/{env}/end
1 /some/other/url/{envlabel}/{envvar}
CodePudding user response:
Don't loop, use:
df['URL'] = df['URL'].str.replace('(envlabel|envvar|env)\w{32}', r'{\1}', regex=True)
output:
URL
0 /someurl/{env}/end
1 /some/other/url/{envlabel}/{envvar}