Below is my data frame.
df = pd.DataFrame({'vin':['aaa','aaa','aaa','bbb','bbb','bbb','ddd','eee','eee','fff'],
'c1':[35,'NA','NA','NA',4,'NA','NA','NA',56,406],
'c2':[35,'NA','NA','NA',43,'NA','NA','NA',67,77],
'c3':[36,'NA','NA','NA',104,'NA','NA','NA',44,306]})
I want to drag integer values of 'c1' 'c2', and 'c3' with respect to column 'vin'.
Below is the expected output:-
df = pd.DataFrame({'vin':['aaa','aaa','aaa','bbb','bbb','bbb','ddd','eee','eee','fff'],
'c1':[35,35,35,4,4,4,'NA',56,56,406],
'c2':[35,35,35,43,43,43,'NA',67,67,77],
'c3':[36,36,36,104,104,104,'NA',44,44,306]})
Note: The value of 'vin' column 'ddd' will remain as 'NA' as there is no integer value is available for 'ddd'.
I tried ffill() or groupbyffill() but I don't get the expected output. Below are lines of code, I tried.
#(1)
df['c1'] = df.groupby(['vin'])['c1'].ffill()
#(2)
df[["c1","c2" "c3"]] = df[["c1", "c2","c3"]].fillna(df.groupby(['vin'])[["c1", "c2","c3"]].ffill())
#(3)
df["c1"] = df.groupby('vin')['c1'].transform(lambda x: x.ffill())
CodePudding user response:
If there is one non missing value per group with ffill
first sorting clumns:
c = ["c1","c2", "c3"]
df[c] = df.replace('NA',np.nan).sort_values(['vin', 'c1']).groupby('vin')[c].ffill()
print (df)
vin c1 c2 c3
0 aaa 35.0 35.0 36.0
1 aaa 35.0 35.0 36.0
2 aaa 35.0 35.0 36.0
3 bbb 4.0 43.0 104.0
4 bbb 4.0 43.0 104.0
5 bbb 4.0 43.0 104.0
6 ddd NaN NaN NaN
7 eee 56.0 67.0 44.0
8 eee 56.0 67.0 44.0
9 fff 406.0 77.0 306.0
Or aggregate first non missing values for new columns in GroupBy.transform
:
c = ["c1","c2", "c3"]
df[c] = df.replace('NA',np.nan).groupby('vin')[c].transform('first')
print (df)
vin c1 c2 c3
0 aaa 35.0 35.0 36.0
1 aaa 35.0 35.0 36.0
2 aaa 35.0 35.0 36.0
3 bbb 4.0 43.0 104.0
4 bbb 4.0 43.0 104.0
5 bbb 4.0 43.0 104.0
6 ddd NaN NaN NaN
7 eee 56.0 67.0 44.0
8 eee 56.0 67.0 44.0
9 fff 406.0 77.0 306.0