I have a dataframe like the one given below.
C1 SIZE M COLOR
C1 PRIZE L COLOR
C1 COLOR Nan BLUE
C2 SIZE L COLOR
C2 PRIZE S COLOR
C2 COLOR Nan YELLOW
I am looking for ways to transform it to the one given below.
C1 SIZE M BLUE
C1 PRIZE L BLUE
C1 COLOR Nan BLUE
C2 SIZE L YELLOW
C2 PRIZE S YELLOW
C2 COLOR Nan YELLOW
Can someone please help?
CodePudding user response:
This should work
df = pd.DataFrame({'Col1': ['C1', 'C1', 'C1', 'C2', 'C2', 'C2'],
'Col2': ['SIZE', 'PRIZE', 'COLOR', 'SIZE', 'PRIZE', 'COLOR'],
'Col3': ['M', 'L', 'Nan', 'L', 'S', 'Nan'],
'Col4': ['COLOR', 'COLOR', 'BLUE', 'COLOR', 'COLOR', 'YELLOW'],
'COLOR': ['BLUE', 'BLUE', 'BLUE', 'YELLOW', 'YELLOW', 'YELLOW']})
# mask the rows that have COLOR in Col4, then propagate the color values across groups
df['COLOR'] = df['Col4'].mask(lambda x: x=='COLOR').groupby(df['Col1']).transform('first')
print(df)
Col1 Col2 Col3 Col4 COLOR
0 C1 SIZE M COLOR BLUE
1 C1 PRIZE L COLOR BLUE
2 C1 COLOR Nan BLUE BLUE
3 C2 SIZE L COLOR YELLOW
4 C2 PRIZE S COLOR YELLOW
5 C2 COLOR Nan YELLOW YELLOW
CodePudding user response:
You can try replace the COLOR
with NaN value then forward and backward the NaN value
df['d'] = df['d'].replace('COLOR', pd.NA)
df = df.groupby('a').apply(lambda g: g.bfill().ffill())
print(df)
a b c d
0 C1 SIZE M BLUE
1 C1 PRIZE L BLUE
2 C1 COLOR Nan BLUE
3 C2 SIZE L YELLOW
4 C2 PRIZE S YELLOW
5 C2 COLOR Nan YELLOW
If the valid color exists in last row, you can also try
df['d'] = df['d'].replace('COLOR', pd.NA)
df = df.groupby('a').bfill()