Home > OS >  pandas: replacing the last word in string column with values from list
pandas: replacing the last word in string column with values from list

Time:04-09

I have a dataframe and a list as follows:

df = pd.DataFrame({'data1':['the weather is nice today','This is interesting','the weather is good'],
             'data2':['It is raining','The plant is greenery','the weather is sunnyday']})

my_list = ['sunny','green']

I would like to replace the last words in the data2 column with the words my_list, if the last words start with the words in the list. so, this is what I did,

for k in ke:
    for val in df2.data2:
        if val.split()[-1].startswith(k):
            print(val.replace(val.split()[-1], k))

but when i print it out, the order is affected by the order in the list, and I do not know how to assign them back to the same column.my desired output is,

     data1                      data2
0  the weather is nice today    It is raining
1  This is interesting          The plant is green
2  the weather is good          the weather is sunny

CodePudding user response:

One possible way is to build a regex that matches any last word starting by one of the words of your list. This is more efficient than looping over all the words of your list etc.

pat = re.compile(f"\\b({'|'.join(my_list)})\\S $")
dfnew = df.assign(data2=df['data2'].str.replace(pat, r'\1', regex=True))

>>> dfnew
                       data1                 data2
0  the weather is nice today         It is raining
1        This is interesting    The plant is green
2        the weather is good  the weather is sunny

CodePudding user response:

The answer of Pierre D is excellent. Regex and .str.replace seem like the right tools here.

If you want to replace a column of a df with new values, you can use assign or simply =.

You can apply any function to each value of a column using apply.

Here is a more wordy example that is close to your original solution:

def replace_last_word(val, prefixes=['sunny', 'green']):
    rest, last_word = val.rsplit(' ', 1)
    for prefix in prefixes:
        if last_word.startswith(prefix):
            return f"{rest} {prefix}"
    return val

df['data2'] = df['data2'].apply(replace_last_word)

# or df = df.assign(data2=df['data2'].apply(replace_last_word))

Note that you have to decide how to handle prefixes that are contained in each other, like ["sun", "sunny"]. This solution will choose the first match.

Note that you could also split off the last word from the complete column, and then process the new column:

df['data2'].str.rsplit(' ',n=1, expand=True) will give you

                0         1
0           It is   raining
1    The plant is  greenery
2  the weather is  sunnyday
  • Related