Home > database >  Merge arbitrary number of text columns based on null condition of consecutive rows
Merge arbitrary number of text columns based on null condition of consecutive rows

Time:10-14

I have a dataframe like this:

dummy_df = pd.DataFrame([{'Name': 'First', 'Value': 1}, {'Name': 'Start of', 'Value': None}, {'Name': 'cut off', 'Value': None}, {'Name': 'Last', 'Value': 10}, {'Name': 'First of', 'Value': None}, {'Name': 'three lines', 'Value': None}, {'Name': 'cut off', 'Value': None}, {'Name': 'Actually last', 'Value': 100}])

            Name  Value
0          First    1.0
1       Start of    NaN
2        cut off    NaN
3           Last   10.0
4       First of    NaN
5    three lines    NaN
6        cut off    NaN
7  Actually last  100.0

I have arbitrary numbers of rows where the Value column is NaN and I would like to combine consecutive None rows such that Name is concatenated to produce an end result such as:


                           Name  Value
0                         First    1.0
1              Start of cut off    NaN
2                          Last   10.0
3  First of three lines cut off    NaN
4                 Actually last  100.0

I've tried using some other answers using df.shift and cumsum to no avail.

CodePudding user response:

Try cumsum on notna for the blocks, then groupby:

s = dummy_df['Value'].notna()
dummy_df.groupby([s,s.cumsum()], as_index=False, sort=False).agg({
    'Name': ' '.join,
    'Value': 'first'
})
  • Related