Home > other >  How to merge dataframe rows based on empty cells in a column
How to merge dataframe rows based on empty cells in a column

Time:11-05

If I have the following dataframe:

Index Col1 Col2 Col3
1     10   x    40
2          y    50
3          z    60
4     20   a    30

I would like to merge rows that have a blank Col1 with the previous row that is not blank in Col1.

Expected output:

Index Col1 Col2  Col3
1     10   x,y,z 40,50,60
4     20   a     30

Is this possible?

Thanks

CodePudding user response:

Quite possible, what you need to do is create a unique group value that increments at every value that is not null.

in one go

df.drop('Col1',axis=1).groupby((df['Col1'].isna()==False).cumsum()).agg(list)
#p.s if really want strings use
#df.drop('Col1',axis=1).groupby((df['Col1'].isna()==False
#                      ).cumsum()).astype(str).agg(','.join)


           Col2          Col3
Col1
1     [x, y, z]  [40, 50, 60]
2           [a]          [30]

the key here is the condition:

df[['Col1']].assign(con=df['Col1'].isna()==False)

   Col1    con #for condition
0  10.0   True <-- first group
1   NaN  False
2   NaN  False
3  20.0   True <-- second group

now, creating a cumulative sum allows you to create your grouper object.

df[['Col1']].assign(con=(df['Col1'].isna()==False).cumsum())


   Col1  con
0  10.0    1
1   NaN    1
2   NaN    1
3  20.0    2

CodePudding user response:

We can do

out = df.drop(labels = 'Col1',axis = 1).astype(str).groupby(df['Col1'].mask(df['Col1']=='').ffill()).agg(','.join).reset_index()
Out[85]: 
   Col1   Col2      Col3
0  10.0  x,y,z  40,50,60
1  20.0      a        30
  • Related