I have a dataframe as follows:
import pandas as pd
df = pd.DataFrame({'data1':['the weather is nice today','This is interesting','the weather is good'],
'data2':['It is raining','The plant is green','the weather is sunny']})
and I have a dictionary as follows:
my_dict = {'the weather':'today','the plant':'tree'}
I would like to replace the first two words in the data2 column if they are found in the dictionary key. I have done the following:
for old, new in dic.items():
if pd.Series([' '.join(map(str, l)) for l in df['data2'].str.lower().str.split().map(lambda x: x[0:2])]).str.contains('|'.join(old.capitalize()).any():
df['data2'] = df['data2'].str.replace(old, new.capitalize(), regex=False)
else:
print('does not exist')
but when i print(df), nothing has been replaced.
the expected output:
data1 data2
0 the weather is nice today It is raining
1 This is interesting Tree is green
2 the weather is good Today is sunny
CodePudding user response:
If I understand correctly, this is one way to do it (there may be more efficient ways):
df.data2 = df.data2.str.lower()
for k in my_dict:
df.data2 = df.data2.str[:len(k)].replace(k, my_dict[k]) df.data2.str[len(k):]
df.data2 = df.data2.str.capitalize()
Lowercasing and capitalization weren't in your question but were part of your code, so I put them in (otherwise it would fail because the capitalization doesn't match in your sample code).
CodePudding user response:
- use python map function to go through the arrays
- in the dataframe we have like
The plant
and we are trying to compare it withthe plant
without converting it to lower case.
for old, new in my_dict.items():
if pd.Series([' '.join(map(str, l)) for l in df['data2'].str.lower().str.split().map(lambda x: x[0:2])]).str.contains('|'.join(old)).any():
df['data2'] = list(map(lambda x: x.lower().replace(old, new.capitalize()), df['data2']))
else:
print('does not exist')
CodePudding user response:
You can try with pandas.Series.str.replace
for key, val in my_dict.items():
df['data2'] = df['data2'].str.replace(f'^{key}', val, case=False, regex=True)
print(df)
data1 data2
0 the weather is nice today It is raining
1 This is interesting tree is green
2 the weather is good today is sunny