Let's say I have a data frame
A | B | C |
---|---|---|
A1 | R3 | P5 |
A2 | R2 | P4 |
A2 | NaN | P4 |
A1 | NaN | NaN |
I want to fill in information for the 3rd and 4th rows from the 2nd and 1st rows since column A values match those two rows. The expected data frame is as follows:
A | B | C |
---|---|---|
A1 | R3 | P5 |
A2 | R2 | P4 |
A2 | R2 | P4 |
A1 | R3 | P5 |
Can you please help me with it?
CodePudding user response:
Assuming a single non-NA value per A
You can use a self merge
:
out = df[['A']].merge(df.dropna(), how='left')
Or, if there is a chance of duplicated rows:
out = df[['A']].merge(df.dropna().drop_duplicates(), how='left')
output:
A B C
0 A1 R3 P5
1 A2 R2 P4
2 A2 R2 P4
3 A1 R3 P5
CodePudding user response:
here is one way to do it.
sorting by column 'A', forward filling the null values and then re-sorting by index
df.sort_values(['A'] ).ffill().sort_index()
A B C
0 A1 R3 P5
1 A2 R2 P4
2 A2 R2 P4
3 A1 R3 P5