Home > OS >  Group data by week in Pandas
Group data by week in Pandas

Time:03-24

I'm having this data frame:

Name        Date      author
Apple   2022-03-15    sahil_1
Orange  2022-03-16    sahil_2
Apple   2022-03-17    sahil_3
Orange  2022-03-18    sahil_1
Apple   2022-03-19    sahil_2
Banana  2022-03-20    sahil_3
Apple   2019-12-19    sahil_3
Orange  2004-01-07    sahil_1

I want to aggregate this by Name and Date(weekly) to get count of record.

Date: Group, the result should be at the beginning of the week (or just on Monday)

Count: ADD, if two or more records have same Name and fall in a same week Date time (if falls on same interval 7 days week)

The desired output is given below:

Name        Date      count
Apple    2019-12-16    1
Apple    2022-03-14    3

Banana   2022-03-14    1

Orange   2004-01-05    1
Orange   2022-03-14    2

Note - All date in result are monday OR say first day of week.
If possible result should be sorted in asc(increasing) order and date also for respective of every name.

Thanks in advance.

Not getting how to proceed further.

CodePudding user response:

import pandas as pd 

Name = ["Apple", "Orange", "Apple", "Orange", "Apple", "Banana", "Apple","Orange"]
Date = ["2022-03-15","2022-03-16","2022-03-17","2022-03-18","2022-03-19","2022-03-20","2019-12-19","2004-01-07"]
author = ["sahil_1","sahil_2","sahil_3","sahil_1","sahil_2","sahil_3","sahil_3","sahil_1"]

df = pd.DataFrame(zip(Name,Date,author), columns=["Name", "Date", "Author"])
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
x = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Name'].count()
print(x)

CodePudding user response:

Thanks @Pedrinho for Quick help. I modified code little bit and got my desired result. But not sure if its right approach to do it.

Solution code -

import pandas as pd 

Name = ["Apple", "Orange", "Apple", "Orange", "Apple", "Banana", "Apple","Orange"]
Date = ["2022-03-15","2022-03-16","2022-03-17","2022-03-18","2022-03-19","2022-03-20","2019-12-19","2004-01-07"]
author = ["sahil_1","sahil_2","sahil_3","sahil_1","sahil_2","sahil_3","sahil_3","sahil_1"]

df = pd.DataFrame(zip(Name,Date,author), columns=["Name", "Date", "Author"])

df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')

df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])#['Name'].count()

result = []

for group_id, group_df in df:
    res = {}
    res['Name'] = group_id[0]
    res['Week'] = str(group_id[1])[:-9]  
    res['count'] =  group_df['Name'].count()
    result.append(res)

print(f"Result df is: {result}")

Console o/p -

Result df is: [{'Name': 'Apple', 'Week': '2019-12-16', 'count': 1}, {'Name': 'Apple', 'Week': '2022-03-14', 'count': 3}, {'Name': 'Banana', 'Week': '2022-03-14', 'count': 1}, {'Name': 'Orange', 'Week': '2004-01-05', 'count': 1}, {'Name': 'Orange', 'Week': '2022-03-14', 'count': 2}]
  • Related