I have a dataset as follows,(there are many data rows than mentioned below)
Date Calls
0 2022-01-02, Sunday 482920
1 2022-01-01, Saturday 482920
2 2021-12-31, Friday 482920
3 2021-12-30, Thursday 482920
4 2021-12-29, Wednesday 519995
5 2021-12-28, Tuesday 482920
6 2021-12-27, Monday 519995
7 2021-12-26, Sunday 522273
8 2021-12-25, Saturday 508439
9 2021-12-24, Friday 456587
10 2021-12-23, Thursday 482920
11 2021-12-22, Wednesday 519995
12 2021-12-21, Tuesday 522273
13 2021-12-20, Monday 508439
14 2021-12-19, Sunday 456587
I need to get the mean value for each same date mentioned in the dataset.As an example I need to get the mean of calls of all 'Sunday'. So I need another column of average calls like below.
Date Calls Avgerage_Calls
0 2022-01-02, Sunday 482920 487260.0
1 2022-01-01, Saturday 482920 495679.5
2 2021-12-31, Friday 482920 469753.5
3 2021-12-30, Thursday 482920 482920.0
4 2021-12-29, Wednesday 519995 519995.0
5 2021-12-28, Tuesday 482920 469753.5
6 2021-12-27, Monday 519995 469753.5
7 2021-12-26, Sunday 522273 487260.0
8 2021-12-25, Saturday 508439 495679.5
9 2021-12-24, Friday 456587 469753.5
10 2021-12-23, Thursday 482920 482920.0
11 2021-12-22, Wednesday 519995 519995.0
12 2021-12-21, Tuesday 522273 469753.5
13 2021-12-20, Monday 508439 469753.5
14 2021-12-19, Sunday 456587 487260.0
So up to now I have used these steps to achieve this.
df_new = df[df['Date'].str.contains('Sunday', regex=False, case=False, na=False)]
x=df_new["Calls"].mean()
x
It provides the values of the mean of same date. But there might be some straight method to get all these average without having different data frames. Can some one help me to solve this?
CodePudding user response:
Here is a solution :
df = pd.DataFrame({'Date': ['2022-01-02, Sunday', '2022-01-01, Saturday', '2021-12-31, Friday', '2021-12-30, Thursday', '2021-12-29, Wednesday',
'2021-12-28, Tuesday', '2021-12-27, Monday', '2021-12-26, Sunday','2021-12-25, Saturday','2021-12-24, Friday','2021-12-23, Thursday',
'2021-12-22, Wednesday','2021-12-21, Tuesday','2021-12-20, Monday','2021-12-19, Sunday'],
'Calls': [482920,482920,482920,482920,519995,482920,519995,522273,508439,456587,482920,519995,522273,508439,456587]})
df['day'] = df['Date'].apply(lambda x : x.split(',')[1].strip())
df['Avgerage_Calls'] = df.groupby(df['day'])['Calls'].transform('mean')
df.drop(columns=['day'],inplace = True)
Out put:
Date Calls Avgerage_Calls
0 2022-01-02, Sunday 482920 487260.0
1 2022-01-01, Saturday 482920 495679.5
2 2021-12-31, Friday 482920 469753.5
3 2021-12-30, Thursday 482920 482920.0
4 2021-12-29, Wednesday 519995 519995.0
5 2021-12-28, Tuesday 482920 502596.5
6 2021-12-27, Monday 519995 514217.0
7 2021-12-26, Sunday 522273 487260.0
8 2021-12-25, Saturday 508439 495679.5
9 2021-12-24, Friday 456587 469753.5
10 2021-12-23, Thursday 482920 482920.0
11 2021-12-22, Wednesday 519995 519995.0
12 2021-12-21, Tuesday 522273 502596.5
13 2021-12-20, Monday 508439 514217.0
14 2021-12-19, Sunday 456587 487260.0
I create a new columns day
which contain the day and then I calculated the mean by day (using groupby).
The str.split(',') is used to split the date. For example if s= '2021-12-20, Monday'
then s.split(',')
gives ['2021-12-20', ' Monday']
. str.strip() is used to drop whitespace.