So I have a pandas dataframe and I am trying to substitute each value (a string in my case) to an element of a list if the string contains thyat element (a substring). To clarify, I have a simple dataframe:
data = {'column': ['I have a dog', 'and I have two cats', 'I have nothing', 'I like pandas', " "]}
df = pd.DataFrame(data)
list = ['dog', 'cat', 'panda']
The desired output looks as follows:
data = {'column': ['dog', 'cat', 'I have nothing', 'pandas', " "]}
df = pd.DataFrame(data)
I can do the following:
df.loc[df['column'].str.contains("dog"), "column"]= "dog"
And apply above line of code also for cat and pandas. But the problem is, this would require many line of code if the list of substrings is very long. Is there a simpler way to do this? Hence, for each record it needs to check if it contains any element of a list, then replace that value by that element.
CodePudding user response:
Simplier is use loop here:
L = ['dog', 'cat', 'panda']
for x in L:
df.loc[df['column'].str.contains(x), "column"]= x
print (df)
column
0 dog
1 cat
2 I have nothing
3 panda
4
Or use Series.str.extract
with Series.fillna
by original data:
df['column'] = (df['column'].str.extract(f'({"|".join(L)})', expand=False)
.fillna(df['column']))
print (df)
column
0 dog
1 cat
2 I have nothing
3 panda
4
CodePudding user response:
We can use str.replace
to substitute values
df['column'] = df['column'].str.replace(fr".*({'|'.join(lst)}).*", r'\1')
column
0 dog
1 cat
2 I have nothing
3 pandas
4
See the online regex demo