Python pandas update row value from another row-CodePudding

I have a dataframe like the one given below.

C1 SIZE  M   COLOR
C1 PRIZE L   COLOR
C1 COLOR Nan BLUE
C2 SIZE  L   COLOR
C2 PRIZE S   COLOR
C2 COLOR Nan YELLOW

I am looking for ways to transform it to the one given below.

C1 SIZE  M   BLUE
C1 PRIZE L   BLUE
C1 COLOR Nan BLUE
C2 SIZE  L   YELLOW
C2 PRIZE S   YELLOW
C2 COLOR Nan YELLOW

Can someone please help?

CodePudding user response：

This should work

df = pd.DataFrame({'Col1': ['C1', 'C1', 'C1', 'C2', 'C2', 'C2'],
                   'Col2': ['SIZE', 'PRIZE', 'COLOR', 'SIZE', 'PRIZE', 'COLOR'],
                   'Col3': ['M', 'L', 'Nan', 'L', 'S', 'Nan'],
                   'Col4': ['COLOR', 'COLOR', 'BLUE', 'COLOR', 'COLOR', 'YELLOW'],
                   'COLOR': ['BLUE', 'BLUE', 'BLUE', 'YELLOW', 'YELLOW', 'YELLOW']})
# mask the rows that have COLOR in Col4, then propagate the color values across groups
df['COLOR'] = df['Col4'].mask(lambda x: x=='COLOR').groupby(df['Col1']).transform('first')
print(df)
  Col1   Col2 Col3    Col4   COLOR
0   C1   SIZE    M   COLOR    BLUE
1   C1  PRIZE    L   COLOR    BLUE
2   C1  COLOR  Nan    BLUE    BLUE
3   C2   SIZE    L   COLOR  YELLOW
4   C2  PRIZE    S   COLOR  YELLOW
5   C2  COLOR  Nan  YELLOW  YELLOW

CodePudding user response：

You can try replace the COLOR with NaN value then forward and backward the NaN value

df['d'] = df['d'].replace('COLOR', pd.NA)
df = df.groupby('a').apply(lambda g: g.bfill().ffill())

print(df)

    a      b    c       d
0  C1   SIZE    M    BLUE
1  C1  PRIZE    L    BLUE
2  C1  COLOR  Nan    BLUE
3  C2   SIZE    L  YELLOW
4  C2  PRIZE    S  YELLOW
5  C2  COLOR  Nan  YELLOW

If the valid color exists in last row, you can also try

df['d'] = df['d'].replace('COLOR', pd.NA)

df = df.groupby('a').bfill()