Home > Net >  Populate Adjacent Value in Pandas Column When String is Duplicated
Populate Adjacent Value in Pandas Column When String is Duplicated

Time:11-27

I'm attempting to overwrite the value named in a column called 'Group' when the value in a column called 'Keyword' is a duplicate with the adjacent value.

For example, because the string 'commercial office cleaning services' is duplicated, I'd like to overwrite the adjacent column to 'commercial cleaning services'.

Example Data

enter image description here

Desired Output

enter image description here

Minimum Reproducible Example

import pandas as pd

data = [
    ["commercial cleaning services", "commercial cleaning services"],
    ["commercial office cleaning services", "commercial cleaning services"],
    ["janitorial cleaning services", "commercial cleaning services"],
    ["commercial office services", "commercial cleaning"],
]
df = pd.DataFrame(data, columns=["Keyword", "Group"])
print(df)

I'm fairly new to pandas and not sure where to start, I've reached a dead end Googling and searching stackoverflow.

CodePudding user response:

IIUC, use duplicated with mask and ffill :

#is the keyword duplicated ?
m = df['Keyword'].duplicated()

df['Group'] = df['Group'].mask(m).ffill()

# Output:

print(df)

                               Keyword                         Group
0         commercial cleaning services  commercial cleaning services
1  commercial office cleaning services  commercial cleaning services
2         janitorial cleaning services  commercial cleaning services
3  commercial office cleaning services  commercial cleaning services

NB: The reproducible example does not match the image of the input (https://i.stack.imgur.com/fPWPa.png)

  • Related