I have a input data as below
Case ID | Name |
---|---|
1 | |
1 | rohit |
1 | Sakshi |
2 | |
2 | |
2 |
So basically the input data has two(2) type of Case IDs, one where there is both blank and non-blank values(rows) for a Case ID and another where there is just blank value(rows) for the case.
I am trying to get below output :-
Case ID | Name |
---|---|
1 | rohit |
1 | Sakshi |
2 |
i.e., if a case has both blank and non-blank values then for that Case ID just show the non-blank values and for the case where all values are blank then just have a single row/record with blank value in the column 'Name'
CodePudding user response:
one way (not efficient but flexible) way is to use the split-apply-combine approach with a custom function:
def drop_empty(df0):
df0 = df0.copy() # lose a value is trying to be set on a slice warning
if df0['Name'].count()!=0:
df0.dropna(thresh=2, inplace=True)
else:
df0.drop_duplicates(inplace=True)
return df0[['Name']]
df.groupby('Case ID').apply(drop_empty).reset_index()[['Case ID', 'Name']]
CodePudding user response:
you can also try something like this:
indx = df.groupby('Case ID')['Name'].apply(lambda x: x.dropna() if x.count() else x.head(1))
df = df.loc[indx.index.get_level_values(1)]
>>> df
'''
Case ID Name
1 1 rohit
2 1 Sakshi
3 2 NaN
suppose your input dataframe looks like:
Case ID Name
0 1 NaN
1 1 rohit
2 1 Sakshi
3 2 NaN
4 2 NaN
5 2 NaN