I have a sample dataframe as given below.
import pandas as pd
data = {'ID':['A', 'A', 'A', 'B','B','B'],
'Date':['2021-09-20 04:34:57', '2021-09-20 04:37:25', '2021-09-20 04:38:26', '2021-09-01
00:12:29','2021-09-01 11:20:58','2021-09-02 09:20:58'],
'Name':['xx','','','yy','',''],
'Height':['174cm','','','160cm','',''],
'Weight':['74kg','','','58kg','',''],
'Gender':['','Male','','','Female',''],
'Interests':['','','Hiking,Sports','','','Singing']}
df1 = pd.DataFrame(data)
df1
I want to combine the data present on the same date into a single row. The 'Date' column is in timestamp format. The final output should look like the image shown below.
Any help is greatly appreciated. Thanks.
CodePudding user response:
Start first by converting to datetime and flooring:
In [3]: df["Date"] = pd.to_datetime(df["Date"]).dt.floor('D')
In [4]: df
Out[4]:
ID Date Name Height Weight Gender Interests
0 A 2021-09-20 xx 174cm 74kg
1 A 2021-09-20 Male
2 A 2021-09-20 Hiking,Sports
3 B 2021-09-01 yy 160cm 58kg
4 B 2021-09-01 Female
5 B 2021-09-02 Singing
Now using groupby
and sum
:
In [5]: df.groupby(["ID", "Date"]).sum().reset_index()
Out[5]:
ID Date Name Height Weight Gender Interests
0 A 2021-09-20 xx 174cm 74kg Male Hiking,Sports
1 B 2021-09-01 yy 160cm 58kg Female
2 B 2021-09-02 Singing
CodePudding user response:
If your data are correctly ordered as your sample, you can merge your data as below:
>>> df1.groupby(['ID', pd.Grouper(key='Date', freq='D')]) \
.sum().reset_index()
ID Date Name Height Weight Gender Interests
0 A 2021-09-20 xx 174cm 74kg Male Hiking,Sports
1 B 2021-09-01 yy 160cm 58kg Female
2 B 2021-09-02 Singing
CodePudding user response:
You can use .groupby()
.agg()
to group by ID
and date and then aggregate the non-blank and blank texts, as follows:
# Convert `Date` to datetime format
df1['Date'] = pd.to_datetime(df1['Date'])
(df1.groupby(['ID', df1['Date'].dt.date])[['Name', 'Height', 'Weight', 'Gender', 'Interests']]
.agg(''.join)
.reset_index()
)
or use pd.Grouper
so that we don't need to quote all column names:
# Convert `Date` to datetime format
df1['Date'] = pd.to_datetime(df1['Date'])
(df1.groupby(['ID', pd.Grouper(key='Date', freq='D')])
.agg(''.join)
.reset_index()
)
Alternatively, you can also use .groupby()
GroupBy.sum()
, as follows:
# Convert `Date` to datetime format
df1['Date'] = pd.to_datetime(df1['Date'])
(df1.groupby(['ID', df1['Date'].dt.date])[['Name', 'Height', 'Weight', 'Gender', 'Interests']]
.sum(numeric_only=False)
.reset_index()
)
similarly, use pd.Grouper
so that we don't need to quote all column names:
# Convert `Date` to datetime format
df1['Date'] = pd.to_datetime(df1['Date'])
(df1.groupby(['ID', pd.Grouper(key='Date', freq='D')])
.sum(numeric_only=False)
.reset_index()
)
Result:
ID Date Name Height Weight Gender Interests
0 A 2021-09-20 xx 174cm 74kg Male Hiking,Sports
1 B 2021-09-01 yy 160cm 58kg Female
2 B 2021-09-02 Singing