Concatenating strings in a apandas dataframe-CodePudding

below is an example of string that is a variable of my df:

df_final['newa_1'][8395389]
"['Y02T  10/70' 'Y02E  60/00' 'Y02T  90/16' 'Y04S  10/126' 'Y02T  10/7072' 'Y02T  90/12' 'Y02T   90/14']"

What I would like o do is to put a "_" within the elements between '' regardless of how many spaces there are. So, in the example above, the output should be something like:

"['Y02T_10/70' 'Y02E_60/00' 'Y02T_90/16' 'Y04S_10/126' 'Y02T_10/7072' 'Y02T_90/12' 'Y02T_90/14']"

Thank you

CodePudding user response：

You can do it using re as follows:

foo = "['Y02T  10/70' 'Y02E  60/00' 'Y02T  90/16' 'Y04S  10/126' 'Y02T  10/7072' 'Y02T  90/12' 'Y02T   90/14']"

import re

re_whitespace = re.compile(r"\s ")

re_whitespace.sub('_', foo).replace("'_'", "' '")

Giving you:

"['Y02T_10/70' 'Y02E_60/00' 'Y02T_90/16' 'Y04S_10/126' 'Y02T_10/7072' 'Y02T_90/12' 'Y02T_90/14']"

CodePudding user response：

You can use str.replace function of pandas.

for example: df_final['newa_1']=df_final['newa_1'].str.replace(" ","_")

CodePudding user response：

You can use regex group match and replace:

df["newa_1"].str.replace(r"([^'])(\s )([^'])", "\\1_\\3", regex=True)

['Y02T_10/70' 'Y02E_60/00' 'Y02T_90/16' 'Y04S_10/126' 'Y02T_10/7072' 'Y02T_90/12' 'Y02T_90/14']