I would like to create a group of two within a dataframe based on the ID. So basically connect the text inside the second column using a space. Using groupby()
will only join the entire text. I would like to set the group size per ID myself. If the group size does not add up, then none should be added from another group.
d = {'ID': [0,0,0,1,1,1,1], 'col2': ['Car','Tree','House','Cat','Dog','Cloud','Bottle']}
pd.DataFrame(data=d)
#Expected Output
ID col2
0 0 'Car Tree'
1 0 'House'
2 1 'Cat Dog'
3 1 'Cloud Bottle'
CodePudding user response:
Create a sequential counter with cumcount
then divide this by 2 (desired group size)
to create partitions, then group the dataframe by ID
along with the partitions and aggregate col2
with join
i = df.groupby('ID').cumcount() // 2
df.groupby(['ID', i], as_index=False)['col2'].agg(' '.join)
ID col2
0 0 Car Tree
1 0 House
2 1 Cat Dog
3 1 Cloud Bottle