Home > Net >  Combine rows having same date different time into a single row having the same date in python
Combine rows having same date different time into a single row having the same date in python

Time:09-24

I have a sample dataframe as given below.

import pandas as pd

data = {'ID':['A', 'A', 'A', 'B','B','B'],
    'Date':['2021-09-20 04:34:57', '2021-09-20 04:37:25', '2021-09-20 04:38:26', '2021-09-01 
    00:12:29','2021-09-01 11:20:58','2021-09-02 09:20:58'],
    'Name':['xx','','','yy','',''],
    'Height':['174cm','','','160cm','',''],
    'Weight':['74kg','','','58kg','',''],
    'Gender':['','Male','','','Female',''],
    'Interests':['','','Hiking,Sports','','','Singing']}

 df1 = pd.DataFrame(data)
 df1 

enter image description here

I want to combine the data present on the same date into a single row. The 'Date' column is in timestamp format. The final output should look like the image shown below.

enter image description here

Any help is greatly appreciated. Thanks.

CodePudding user response:

Start first by converting to datetime and flooring:

In [3]: df["Date"] = pd.to_datetime(df["Date"]).dt.floor('D')

In [4]: df
Out[4]:
  ID       Date Name Height Weight  Gender      Interests
0  A 2021-09-20   xx  174cm   74kg
1  A 2021-09-20                       Male
2  A 2021-09-20                             Hiking,Sports
3  B 2021-09-01   yy  160cm   58kg
4  B 2021-09-01                     Female
5  B 2021-09-02                                   Singing

Now using groupby and sum:

In [5]: df.groupby(["ID", "Date"]).sum().reset_index()
Out[5]:
  ID       Date Name Height Weight  Gender      Interests
0  A 2021-09-20   xx  174cm   74kg    Male  Hiking,Sports
1  B 2021-09-01   yy  160cm   58kg  Female
2  B 2021-09-02                                   Singing

CodePudding user response:

If your data are correctly ordered as your sample, you can merge your data as below:

>>> df1.groupby(['ID', pd.Grouper(key='Date', freq='D')]) \
       .sum().reset_index()

  ID       Date Name Height Weight  Gender      Interests
0  A 2021-09-20   xx  174cm   74kg    Male  Hiking,Sports
1  B 2021-09-01   yy  160cm   58kg  Female               
2  B 2021-09-02                                   Singing

CodePudding user response:

You can use .groupby() .agg() to group by ID and date and then aggregate the non-blank and blank texts, as follows:

# Convert `Date` to datetime format
df1['Date'] = pd.to_datetime(df1['Date'])

(df1.groupby(['ID', df1['Date'].dt.date])[['Name', 'Height', 'Weight', 'Gender', 'Interests']]
    .agg(''.join)
    .reset_index()
)

or use pd.Grouper so that we don't need to quote all column names:

# Convert `Date` to datetime format
df1['Date'] = pd.to_datetime(df1['Date'])

(df1.groupby(['ID', pd.Grouper(key='Date', freq='D')])
    .agg(''.join)
    .reset_index()
)

Alternatively, you can also use .groupby() GroupBy.sum(), as follows:

# Convert `Date` to datetime format
df1['Date'] = pd.to_datetime(df1['Date'])

(df1.groupby(['ID', df1['Date'].dt.date])[['Name', 'Height', 'Weight', 'Gender', 'Interests']]
    .sum(numeric_only=False)
    .reset_index()
)

similarly, use pd.Grouper so that we don't need to quote all column names:

# Convert `Date` to datetime format
df1['Date'] = pd.to_datetime(df1['Date'])

(df1.groupby(['ID', pd.Grouper(key='Date', freq='D')])
    .sum(numeric_only=False)
    .reset_index()
)

Result:

  ID        Date Name Height Weight  Gender      Interests
0  A  2021-09-20   xx  174cm   74kg    Male  Hiking,Sports
1  B  2021-09-01   yy  160cm   58kg  Female               
2  B  2021-09-02                                   Singing
  • Related