Removing blank rows-CodePudding

I have a input data as below

Case ID	Name
1
1	rohit
1	Sakshi
2
2
2

So basically the input data has two(2) type of Case IDs, one where there is both blank and non-blank values(rows) for a Case ID and another where there is just blank value(rows) for the case.

I am trying to get below output :-

Case ID	Name
1	rohit
1	Sakshi
2

i.e., if a case has both blank and non-blank values then for that Case ID just show the non-blank values and for the case where all values are blank then just have a single row/record with blank value in the column 'Name'

CodePudding user response：

one way (not efficient but flexible) way is to use the split-apply-combine approach with a custom function:

def drop_empty(df0):
    df0 = df0.copy() # lose a value is trying to be set on a slice warning
    if df0['Name'].count()!=0:
        df0.dropna(thresh=2, inplace=True)
    else:
        df0.drop_duplicates(inplace=True)
    return df0[['Name']]

df.groupby('Case ID').apply(drop_empty).reset_index()[['Case ID', 'Name']]

CodePudding user response：

you can also try something like this:

indx = df.groupby('Case ID')['Name'].apply(lambda x: x.dropna() if x.count() else x.head(1))
df = df.loc[indx.index.get_level_values(1)]

>>> df
'''
   Case ID    Name
1        1   rohit
2        1  Sakshi
3        2     NaN

suppose your input dataframe looks like:

   Case ID    Name
0        1     NaN
1        1   rohit
2        1  Sakshi
3        2     NaN
4        2     NaN
5        2     NaN