Home > Blockchain >  In Python Pandas, How do I concatenate rows of a df based on two columns? and in the order of a thir
In Python Pandas, How do I concatenate rows of a df based on two columns? and in the order of a thir

Time:07-21

Python Pandas

I am trying to merge/concatenate rows together on a pandas data frame. I have a table with columns ['Id','Date','Hour','Sales'].

If I groupby the df by 'Hour' count(), it will be like this:

'ID' 'Date' 'Hour' 01 01/01/22 24 02 01/01/22 24 03 01/02/22 24 05 03/02/22 24

There are several Id's, that sometimes have the same date, but each unique ID/Date has 24 unique rows of 'Hour'. (There is 24 rows per date, and repeated dates by ID.)

I need help for some code that will take unique pair of ID's and date, and concatenate the 24 rows of information, in order of the 'hour' column.

Therefore, the result would be every unique ID and Date, will have a vector, of the concatenated hour information.

The catch is that each concatenation has to be done based on two columns, 'Id' and 'Date', and in order of the 'hour' column.

Appreciate the help ppl.

CodePudding user response:

Here is a short example and some code that moves the 'Sales' data into separate columns for each hour. You can change the value in the range from 3 to 24 for your case.

The key here is using the pandas.merge function with the on argument to choose which columns to use as an index for merging.

import pandas as pd
df = pd.DataFrame([['Dave', 1, 0, 10],['Dave', 1, 1, 20],['Dave', 1, 2, 30],
                   ['Dave', 2, 0, 40],['Dave', 2, 1, 50],['Dave', 2, 2, 60],
                   ['Carl', 1, 0, 15],['Carl', 1, 1, 25],['Carl', 1, 2, 35],
                   ['Carl', 2, 0, 45],['Carl', 2, 1, 55],['Carl', 2, 2, 65]],
                  columns=['ID', 'Date', 'Hour', 'Sales'])
    
new = pd.DataFrame(columns=['ID','Date'])
for hour in range(3):
    tmp = df.where(df.Hour == hour).dropna(axis=0, how='all')
    tmp[hour] = tmp['Sales']
    tmp.drop(['Hour','Sales'], axis=1, inplace=True)
    new = new.merge(tmp, how='outer', on=['ID','Date'])
new.head()

Input:

      ID  Date  Hour  Sales
0   Dave     1     0     10
1   Dave     1     1     20
2   Dave     1     2     30
3   Dave     2     0     40
4   Dave     2     1     50
5   Dave     2     2     60
6   Carl     1     0     15
7   Carl     1     1     25
8   Carl     1     2     35
9   Carl     2     0     45
10  Carl     2     1     55
11  Carl     2     2     65

Output:

     ID  Date     0     1     2
0  Dave   1.0  10.0  20.0  30.0
1  Dave   2.0  40.0  50.0  60.0
2  Carl   1.0  15.0  25.0  35.0
3  Carl   2.0  45.0  55.0  65.0
  • Related