How to replace NULL values with Mean value of a category in SQL?-CodePudding

I have a dataset with null values in the column 'revenues_from_appointment'

Dataset

appointment_date	patient_id	practitioner_id	appointment_duration_min	revenues_from_appointment
2021-06-28	42734	748	30	90.0
2021-06-29	42737	747	60	150.0
2021-07-01	42737	747	60	NaN
2021-07-03	42736	748	30	60.0
2021-07-03	42735	747	15	42.62
2021-07-04	42734	748	30	NaN
2021-07-05	42734	748	30	100.0
2021-07-10	42738	747	15	50.72
2021-08-12	42739	748	30	73.43

I wish to replace NULL values by the mean value of rows where "patient_id, practitioner_id, appointment_duration_min" is the same.

I did it using pandas dataframe,

df['revenues_from_appointment'].fillna(df.groupby(['patient_id','practitioner_id','appointment_duration_min'])['revenues_from_appointment'].transform('mean'), inplace = True)

How can we obtain the same result by using SQL?

Final Output

appointment_date	patient_id	practitioner_id	appointment_duration_min	revenues_from_appointment
2021-06-28	42734	748	30	90.0
2021-06-29	42737	747	60	150.0
2021-07-01	42737	747	60	150.0
2021-07-03	42736	748	30	60.0
2021-07-03	42735	747	15	42.62
2021-07-04	42734	748	30	95.0
2021-07-05	42734	748	30	100.0
2021-07-10	42738	747	15	50.72
2021-08-12	42739	748	30	73.43

CodePudding user response：

You can use the AVG window function, that will partition on the three column of interest and replace null values using the COALESCE function:

SELECT appointment_date,
       patient_id,
       practitioner_id,
       appointment_duration_min,
       COALESCE(revenues_from_appointment, 
                AVG(revenues_from_appointment) OVER(PARTITION BY patient_id, 
                                                                 practitioner_id, 
                                                                 appointment_duration_min))
FROM tab

Try it here.