I have a dataframe and a list as follows:
df = pd.DataFrame({'data1':['the weather is nice today','This is interesting','the weather is good'],
'data2':['It is raining','The plant is greenery','the weather is sunnyday']})
my_list = ['sunny','green']
I would like to replace the last words in the data2 column with the words my_list, if the last words start with the words in the list. so, this is what I did,
for k in ke:
for val in df2.data2:
if val.split()[-1].startswith(k):
print(val.replace(val.split()[-1], k))
but when i print it out, the order is affected by the order in the list, and I do not know how to assign them back to the same column.my desired output is,
data1 data2
0 the weather is nice today It is raining
1 This is interesting The plant is green
2 the weather is good the weather is sunny
CodePudding user response:
One possible way is to build a regex that matches any last word starting by one of the words of your list. This is more efficient than looping over all the words of your list etc.
pat = re.compile(f"\\b({'|'.join(my_list)})\\S $")
dfnew = df.assign(data2=df['data2'].str.replace(pat, r'\1', regex=True))
>>> dfnew
data1 data2
0 the weather is nice today It is raining
1 This is interesting The plant is green
2 the weather is good the weather is sunny
CodePudding user response:
The answer of Pierre D is excellent. Regex and .str.replace seem like the right tools here.
If you want to replace a column of a df with new values, you can use assign or simply =
.
You can apply any function to each value of a column using apply.
Here is a more wordy example that is close to your original solution:
def replace_last_word(val, prefixes=['sunny', 'green']):
rest, last_word = val.rsplit(' ', 1)
for prefix in prefixes:
if last_word.startswith(prefix):
return f"{rest} {prefix}"
return val
df['data2'] = df['data2'].apply(replace_last_word)
# or df = df.assign(data2=df['data2'].apply(replace_last_word))
Note that you have to decide how to handle prefixes that are contained in each other, like ["sun", "sunny"]
. This solution will choose the first match.
Note that you could also split off the last word from the complete column, and then process the new column:
df['data2'].str.rsplit(' ',n=1, expand=True)
will give you
0 1
0 It is raining
1 The plant is greenery
2 the weather is sunnyday