Home > Mobile >  Grouping data month-wise with Categorical data in pandas
Grouping data month-wise with Categorical data in pandas

Time:08-30

How can I group data into months from dates where a data frame has both categorical and numerical data in pandas. I tried the groupby function but I think it won't work with categorical data. There are multiple values in the categorical column. Sample data:

Date Campaign_Name No_of_Male_Viewers No_of_Female_Viewers
2021-06-12 Dove_birds 1268 7656
2021-02-05 Pantene_winner 657 8964
2021-09-15 Budweiser_wazap 7642 76
2021-05-13 Pantene_winner 425 6578
2021-12-12 Budweiser_wazap 9867 111
2021-09-09 Dove_birds 1578 11456
2021-05-24 Pantene_winner 678 7475
2021-09-27 Budweiser_wazap 8742 96
2021-09-09 Dove_soft 1175 15486

Now I need to group the data months wise and show for example that Budweiser_wazap in September gained a total audience of xxxx and in December gained xxxx audience and so on for the other campaigns as well.

Expected output sample:

Month Campaign_Name No_of_Male_Viewers No_of_Female_Viewers
February Pantene_winner 657 8964
September Budweiser_wazap 16384 172

Since Budweiser_wazap campaign ran twice in September, the resulting output for No_of_Male_Viewers is: 7642 8742 = 16384, and for No_of_Female_Viewers is: 76 96 = 172.

CodePudding user response:

USE-

##Get Month Name for each date
df['Month'] = df['Date'].dt.month_name()

#Groupby `Month` & `Campaign_Name`
df.groupby(['Month', 'Campaign_Name'])[['No_of_Male_viewers', 'No_of_Female_viewers']].sum().reset_index()
df

Sample Reproducible code-

import pandas as pd
import numpy as np
from pandas import DataFrame

df = pd.DataFrame({
    'Date' : ['2015-06-08', '2015-08-05', '2015-05-06', '2015-05-05', '2015-07-08', '2015-05-07', '2015-06-05', '2015-07-05'], 
    'Sym'  : ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 
    'Data2': [11, 8, 10, 15, 110, 60, 100, 40],
    'Data3': [5, 8, 6, 1, 50, 100, 60, 120]
})
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.month_name()
df

df output-

Date           Sym  Data2   Data3  Month
0   2015-06-08  aapl    11  5   June
1   2015-08-05  aapl    8   8   August
2   2015-05-06  aapl    10  6   May
3   2015-05-05  aapl    15  1   May
4   2015-07-08  aaww    110 50  July
5   2015-05-07  aaww    60  100 May
6   2015-06-05  aaww    100 60  June
7   2015-07-05  aaww    40  120 July

Groupby Condition-

df.groupby(['Month', 'Sym'])[['Data2', 'Data3']].sum().reset_index()

Output-

    Month   Sym   Data2 Data3
0   August  aapl    8   8
1   July    aaww    150 170
2   June    aapl    11  5
3   June    aaww    100 60
4   May     aapl    25  7
5   May     aaww    60  100

Ref link- enter image description here

  • Related