I'm having this data frame:
Name Date author Apple 2022-03-15 sahil_1 Orange 2022-03-16 sahil_2 Apple 2022-03-17 sahil_3 Orange 2022-03-18 sahil_1 Apple 2022-03-19 sahil_2 Banana 2022-03-20 sahil_3 Apple 2019-12-19 sahil_3 Orange 2004-01-07 sahil_1
I want to aggregate this by Name and Date(weekly) to get count of record.
Date: Group, the result should be at the beginning of the week (or just on Monday)
Count: ADD, if two or more records have same Name and fall in a same week Date time (if falls on same interval 7 days week)
The desired output is given below:
Name Date count Apple 2019-12-16 1 Apple 2022-03-14 3 Banana 2022-03-14 1 Orange 2004-01-05 1 Orange 2022-03-14 2
Note - All date in result are monday OR say first day of week.
If possible result should be sorted in asc(increasing) order and date also for respective of every name.
Thanks in advance.
Not getting how to proceed further.
CodePudding user response:
import pandas as pd
Name = ["Apple", "Orange", "Apple", "Orange", "Apple", "Banana", "Apple","Orange"]
Date = ["2022-03-15","2022-03-16","2022-03-17","2022-03-18","2022-03-19","2022-03-20","2019-12-19","2004-01-07"]
author = ["sahil_1","sahil_2","sahil_3","sahil_1","sahil_2","sahil_3","sahil_3","sahil_1"]
df = pd.DataFrame(zip(Name,Date,author), columns=["Name", "Date", "Author"])
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
x = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Name'].count()
print(x)
CodePudding user response:
Thanks @Pedrinho for Quick help. I modified code little bit and got my desired result. But not sure if its right approach to do it.
Solution code -
import pandas as pd
Name = ["Apple", "Orange", "Apple", "Orange", "Apple", "Banana", "Apple","Orange"]
Date = ["2022-03-15","2022-03-16","2022-03-17","2022-03-18","2022-03-19","2022-03-20","2019-12-19","2004-01-07"]
author = ["sahil_1","sahil_2","sahil_3","sahil_1","sahil_2","sahil_3","sahil_3","sahil_1"]
df = pd.DataFrame(zip(Name,Date,author), columns=["Name", "Date", "Author"])
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])#['Name'].count()
result = []
for group_id, group_df in df:
res = {}
res['Name'] = group_id[0]
res['Week'] = str(group_id[1])[:-9]
res['count'] = group_df['Name'].count()
result.append(res)
print(f"Result df is: {result}")
Console o/p -
Result df is: [{'Name': 'Apple', 'Week': '2019-12-16', 'count': 1}, {'Name': 'Apple', 'Week': '2022-03-14', 'count': 3}, {'Name': 'Banana', 'Week': '2022-03-14', 'count': 1}, {'Name': 'Orange', 'Week': '2004-01-05', 'count': 1}, {'Name': 'Orange', 'Week': '2022-03-14', 'count': 2}]