Home > Mobile >  How to use groupby in Python to merge text while keeping the other rows fixed?
How to use groupby in Python to merge text while keeping the other rows fixed?

Time:02-17

I have the following dataframe:

import pandas as pd

df = pd.DataFrame({'Date':['2022-01-01', '2022-01-01','2022-01-01','2022-02-01','2022-02-01',
                      '2022-03-01','2022-03-01','2022-03-01'],
              'Type': ['R','R','R','P','P','G','G','G'],
              'Class':[1,1,1,0,0,2,2,2],
              'Text':['Hello-','I would like.','to be merged.','with all other.',
                      'sentences that.','belong to my same.','group.','thanks a lot.']})

df.index =[1,1,1,2,2,3,3,3]

What I would like to do is grouping by the index to join the column of the text while keeping only the first row for the other columns.

I tried the following two solutions without success. Probably I should combine them but I have no idea on how to do it.

# Approach 1
df.groupby([df.index],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))

# Approach 2
df.groupby([df.index], as_index=False).agg({'Date': 'first',
                    'Type': 'first', 'Class': 'first', 'Test': 'join'})

The outcome should be:


Date          Type   Class   Text
2022-01-01     R      1      Hello. I would like to be merged.
2022-02-01     P      0      with all other sentences that.
2022-03-01     G      2      belong to my same. group. thanks a lot.

Can anyone help me do it?

Thanks!

CodePudding user response:

My idea would be to take the second approach and aggregate the text to a list and then simply join the individual strings like this:

new_df = df.groupby([df.index], as_index=False).agg({'Date': 'first',
                    'Type': 'first', 'Class': 'first', 'Text': list})
new_df['Text'] = new_df['Text'].str.join('')
print(new_df)

Output:


Date    Type    Class   Text
0   2022-01-01  R   1   Hello-I would like.to be merged.
1   2022-02-01  P   0   with all other.sentences that.
2   2022-03-01  G   2   belong to my same.group.thanks a lot.

Found out you can do it in a single statement as well (same approach):

new_df = df.groupby([df.index], as_index=False).agg({'Date': 'first',
                    'Type': 'first', 'Class': 'first', 'Text': ''.join})
  • Related