Home > Net >  Pandas data manipulation from column to row elements
Pandas data manipulation from column to row elements

Time:05-16

I have dataset with millions of rows, here is an example of what it looks like and what I intend to output:

data = [[1, 100, 8], [1, 100, 4], 
       [1, 100,6], [2, 100, 0],
      [2, 200, 1], [3, 300, 7], 
      [4, 400, 2], [5, 100, 6], 
      [5, 100, 3], [5, 600, 1]] 
      
df= pd.DataFrame(data, columns =['user', 'time', 'item'])
print(df)
   user  time   item
     1   100     8
     1   100     4
     1   100     6
     2   100     0
     2   200     1
     3   300     7
     4   400     2
     5   100     6
     5   100     3
     5   600     1

The desired output should have all items consumed by a user within the same time to appear together in the items column as follows

user time   item
1    100    8,4,6
2    100    0
5    100    6,3
2    200    1
3    300    7
4    400    2
5    500    6

For example, user: 1 consumed products 8,4,6 within time: 100

How could this be achieved?

CodePudding user response:

Use df.astype with Groupby.agg and df.sort_values:

In [489]: out = df.astype(str).groupby(['user', 'time'])['item'].agg(','.join).reset_index().sort_values('time')

In [490]: out
Out[490]: 
  user time   item
0    1  100  8,4,6
1    2  100      0
5    5  100    6,3
2    2  200      1
3    3  300      7
4    4  400      2
6    5  600      1
  • Related