I have dataframe with column d1 and now i am trying calculate 'out' column after ranking that column when there in 'nan' value with in a column.
data_input = {'Name':['Renault', 'Renault', 'Renault', 'Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault'],
'type':['Duster', 'Duster', 'Duster','Duster','Duster','Duster','Duster','Triber','Triber','Triber','Triber','Triber','Triber','Triber'],
'd1':['nan','10','10','10','nan','nan','20','20','nan','nan','30','30','30','nan']}
df_input = pd.DataFrame(data_input)
data_out = {'Name':['Renault', 'Renault', 'Renault', 'Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault','Renault'],
'type':['Duster', 'Duster', 'Duster','Duster','Duster','Duster','Duster','Triber','Triber','Triber','Triber','Triber','Triber','Triber'],
'd1':['nan','10','10','10','nan','nan','20','20','nan','nan','30','30','30','nan'],
'out':[1,np.NaN,np.NaN,np.NaN,2,2,np.NaN,np.NaN,1,1,np.NaN,np.NaN,np.NaN,2]}
df_out = pd.DataFrame(data_out)
If in that particular group if nan appers before and after some values then rank should be in asscending. ex: rank for index-0 will be 1 and index-4&5 will be 2(because there is no after values in that group)
df_out["out"] = df_out.groupby(["Name","type"])['d1'].rank(method="first")
Thanks in advance!
CodePudding user response:
Use GroupBy.cumsum
by consecutive mising values per groups:
df_out['d1'] = pd.to_numeric(df_out['d1'], errors='coerce')
m = df_out['d1'].isna()
df_out["out1"] = (df_out.assign(a = (m & ~m.shift(fill_value=False)))
.groupby(["Name","type"])['a']
.cumsum()
.where(m))
Alternative solution with boolean indexing
:
df_out["out1"] = (df_out.assign(a = (m & ~m.shift(fill_value=False)))[m]
.groupby(["Name","type"])['a']
.cumsum())