How to get the mean value for same date of week in python-CodePudding

I have a dataset as follows,(there are many data rows than mentioned below)

        Date                 Calls
0   2022-01-02, Sunday      482920
1   2022-01-01, Saturday    482920
2   2021-12-31, Friday      482920
3   2021-12-30, Thursday    482920
4   2021-12-29, Wednesday   519995
5   2021-12-28, Tuesday     482920
6   2021-12-27, Monday      519995
7   2021-12-26, Sunday      522273
8   2021-12-25, Saturday    508439
9   2021-12-24, Friday      456587
10  2021-12-23, Thursday    482920
11  2021-12-22, Wednesday   519995
12  2021-12-21, Tuesday     522273
13  2021-12-20, Monday      508439
14  2021-12-19, Sunday      456587

I need to get the mean value for each same date mentioned in the dataset.As an example I need to get the mean of calls of all 'Sunday'. So I need another column of average calls like below.

Date                    Calls     Avgerage_Calls
0   2022-01-02, Sunday      482920         487260.0
1   2022-01-01, Saturday    482920         495679.5
2   2021-12-31, Friday      482920         469753.5
3   2021-12-30, Thursday    482920         482920.0
4   2021-12-29, Wednesday   519995         519995.0
5   2021-12-28, Tuesday     482920         469753.5
6   2021-12-27, Monday      519995         469753.5
7   2021-12-26, Sunday      522273         487260.0
8   2021-12-25, Saturday    508439         495679.5
9   2021-12-24, Friday      456587         469753.5
10  2021-12-23, Thursday    482920         482920.0
11  2021-12-22, Wednesday   519995         519995.0
12  2021-12-21, Tuesday     522273         469753.5
13  2021-12-20, Monday      508439         469753.5
14  2021-12-19, Sunday      456587         487260.0

So up to now I have used these steps to achieve this.

df_new = df[df['Date'].str.contains('Sunday', regex=False, case=False, na=False)]
x=df_new["Calls"].mean()
x

It provides the values of the mean of same date. But there might be some straight method to get all these average without having different data frames. Can some one help me to solve this?

CodePudding user response：

Here is a solution :

df = pd.DataFrame({'Date': ['2022-01-02, Sunday', '2022-01-01, Saturday', '2021-12-31, Friday', '2021-12-30, Thursday', '2021-12-29, Wednesday', 
                        '2021-12-28, Tuesday', '2021-12-27, Monday', '2021-12-26, Sunday','2021-12-25, Saturday','2021-12-24, Friday','2021-12-23, Thursday',
                        '2021-12-22, Wednesday','2021-12-21, Tuesday','2021-12-20, Monday','2021-12-19, Sunday'], 
               'Calls': [482920,482920,482920,482920,519995,482920,519995,522273,508439,456587,482920,519995,522273,508439,456587]})

df['day'] = df['Date'].apply(lambda x : x.split(',')[1].strip())
df['Avgerage_Calls'] = df.groupby(df['day'])['Calls'].transform('mean')
df.drop(columns=['day'],inplace = True)

Out put:

           Date           Calls    Avgerage_Calls
0   2022-01-02, Sunday    482920    487260.0
1   2022-01-01, Saturday  482920    495679.5
2   2021-12-31, Friday    482920    469753.5
3   2021-12-30, Thursday  482920    482920.0
4   2021-12-29, Wednesday 519995    519995.0
5   2021-12-28, Tuesday   482920    502596.5
6   2021-12-27, Monday    519995    514217.0
7   2021-12-26, Sunday    522273    487260.0
8   2021-12-25, Saturday  508439    495679.5
9   2021-12-24, Friday    456587    469753.5
10  2021-12-23, Thursday  482920    482920.0
11  2021-12-22, Wednesday 519995    519995.0
12  2021-12-21, Tuesday   522273    502596.5
13  2021-12-20, Monday    508439    514217.0
14  2021-12-19, Sunday    456587    487260.0

I create a new columns day which contain the day and then I calculated the mean by day (using groupby). The str.split(',') is used to split the date. For example if s= '2021-12-20, Monday' then s.split(',') gives ['2021-12-20', ' Monday']. str.strip() is used to drop whitespace.