I have the following dataframe:
import pandas as pd
df = pd.DataFrame({'Date':['2022-01-01', '2022-01-01','2022-01-01','2022-02-01','2022-02-01',
'2022-03-01','2022-03-01','2022-03-01'],
'Type': ['R','R','R','P','P','G','G','G'],
'Class':[1,1,1,0,0,2,2,2],
'Text':['Hello-','I would like.','to be merged.','with all other.',
'sentences that.','belong to my same.','group.','thanks a lot.']})
df.index =[1,1,1,2,2,3,3,3]
What I would like to do is grouping by the index to join the column of the text while keeping only the first row for the other columns.
I tried the following two solutions without success. Probably I should combine them but I have no idea on how to do it.
# Approach 1
df.groupby([df.index],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))
# Approach 2
df.groupby([df.index], as_index=False).agg({'Date': 'first',
'Type': 'first', 'Class': 'first', 'Test': 'join'})
The outcome should be:
Date Type Class Text
2022-01-01 R 1 Hello. I would like to be merged.
2022-02-01 P 0 with all other sentences that.
2022-03-01 G 2 belong to my same. group. thanks a lot.
Can anyone help me do it?
Thanks!
CodePudding user response:
My idea would be to take the second approach and aggregate the text to a list and then simply join the individual strings like this:
new_df = df.groupby([df.index], as_index=False).agg({'Date': 'first',
'Type': 'first', 'Class': 'first', 'Text': list})
new_df['Text'] = new_df['Text'].str.join('')
print(new_df)
Output:
Date Type Class Text
0 2022-01-01 R 1 Hello-I would like.to be merged.
1 2022-02-01 P 0 with all other.sentences that.
2 2022-03-01 G 2 belong to my same.group.thanks a lot.
Found out you can do it in a single statement as well (same approach):
new_df = df.groupby([df.index], as_index=False).agg({'Date': 'first',
'Type': 'first', 'Class': 'first', 'Text': ''.join})