Python: Ad column with average value for each row-CodePudding

I have a dataframe which looks like this:

data1 = [['2020-10-01', '07-08', 3.0 ], ['2020-10-01', '08-09', 2.0], ['2020-10-01', '07-08', 3.0], ['2020-10-01', '07-08', 3.0],['2020-10-02', '07-08', 3.0 ], ['2020-10-02', '08-09', 3.0], ['2020-10-02', '07-08', 3.0], ['2020-10-02', '08-09', 3.0],  ['2020-10-03', '09-10', 9.0], ['2020-10-03', '09-10', 9.0]]
  
df1 = pd.DataFrame(data1, columns = ['Date', 'TimeCategory', 'Value_TimeCategory_total'])

Date	TimeCategory	Value_TimeCategory_total
2020-10-01	07-08	3.0
2020-10-01	08-09	2.0
2020-10-01	07-08	3.0
2020-10-01	07-08	3.0
2020-10-02	07-08	3.0
2020-10-02	08-09	3.0
2020-10-02	07-08	3.0
2020-10-02	08-09	3.0
2020-10-03	09-10	9.0
2020-10-03	09-10	9.0

The Dataframe contains the total values for each TimeCategory during one day.

Now I would like to add a column to this dataframe which displays the average value for each TimeCategory during each day.

If I have 3 rows for with date 2020-10-01 and with the TimeCategory 07-08 and the total value equals 3.0, I would like to have the average value equals 1.0.

The result should look like this.

data2 = [['2020-10-01', '07-08', 3.0 , 1.0], ['2020-10-01', '08-09', 2.0, 2.0], ['2020-10-01', '07-08', 3.0, 1.0], ['2020-10-01', '07-08', 3.0, 1.0],['2020-10-02', '07-08', 3.0, 1.5 ], ['2020-10-02', '08-09', 3.0, 1.5], ['2020-10-02', '07-08', 3.0, 1.5], ['2020-10-02', '08-09', 3.0, 1.5], ['2020-10-03', '09-10', 9.0, 4.5], ['2020-10-03', '09-10', 9.0, 4.5]]
  
df2 = pd.DataFrame(data2, columns = ['Date', 'TimeCategory', 'Value_TimeCategory_total' , 'Value_TimeCategory_Row_Average'])
  
df2

Date	TimeCategory	Value_TimeCategory_total	Value_TimeCategory_Row_Average
2020-10-01	07-08	3.0	1.0
2020-10-01	08-09	2.0	2.0
2020-10-01	07-08	3.0	1.0
2020-10-01	07-08	3.0	1.0
2020-10-02	07-08	3.0	1.5
2020-10-02	08-09	3.0	1.5
2020-10-02	07-08	3.0	1.5
2020-10-02	08-09	3.0	1.5
2020-10-03	09-10	9.0	4.5
2020-10-03	09-10	9.0	4.5

I do not want to use group by, because I need all rows (including duplicates) of my dataframe.

Many thanks in advance for your help.

CodePudding user response：

Idea is divide column Value_TimeCategory_total by counts per groups in GroupBy.transform for get Series with same size like original:

df1['Value_TimeCategory_Row_Average'] = (df1['Value_TimeCategory_total']
                .div(df1.groupby(['Date','TimeCategory'])['Value_TimeCategory_total']
                .transform('size')))
print (df1)

         Date TimeCategory  Value_TimeCategory_total  \
0  2020-10-01        07-08                       3.0   
1  2020-10-01        08-09                       2.0   
2  2020-10-01        07-08                       3.0   
3  2020-10-01        07-08                       3.0   
4  2020-10-02        07-08                       3.0   
5  2020-10-02        08-09                       3.0   
6  2020-10-02        07-08                       3.0   
7  2020-10-02        08-09                       3.0   
8  2020-10-03        09-10                       9.0   
9  2020-10-03        09-10                       9.0   

   Value_TimeCategory_Row_Average  
0                             1.0  
1                             2.0  
2                             1.0  
3                             1.0  
4                             1.5  
5                             1.5  
6                             1.5  
7                             1.5  
8                             4.5  
9                             4.5

Alternative solution:

df1['Value_TimeCategory_Row_Average'] = (df1.groupby(['Date','TimeCategory'])['Value_TimeCategory_total']
                                            .transform(lambda x: x / len(x)))

CodePudding user response：

So, grouped by Date, TimeCategory, the other cells have respectively the same value. I do not think groupby is not necessarily helpful to achieve what you need - you just need to combine it with assign:

df2.set_index(["Date", "TimeCategory"], inplace=True)

df2 = df2.assign(Value_TimeCategory_Row_Average = df2.groupby(["Date", "TimeCategory"]).apply(lambda x:x["Value_TimeCategory_total"].mean() / len(x["Value_TimeCategory_total"])))