Home > Net >  How to replace NULL values with Mean value of a category in SQL?
How to replace NULL values with Mean value of a category in SQL?

Time:06-01

I have a dataset with null values in the column 'revenues_from_appointment'

Dataset

appointment_date patient_id practitioner_id appointment_duration_min revenues_from_appointment
2021-06-28 42734 748 30 90.0
2021-06-29 42737 747 60 150.0
2021-07-01 42737 747 60 NaN
2021-07-03 42736 748 30 60.0
2021-07-03 42735 747 15 42.62
2021-07-04 42734 748 30 NaN
2021-07-05 42734 748 30 100.0
2021-07-10 42738 747 15 50.72
2021-08-12 42739 748 30 73.43

I wish to replace NULL values by the mean value of rows where "patient_id, practitioner_id, appointment_duration_min" is the same.

I did it using pandas dataframe,

df['revenues_from_appointment'].fillna(df.groupby(['patient_id','practitioner_id','appointment_duration_min'])['revenues_from_appointment'].transform('mean'), inplace = True)

How can we obtain the same result by using SQL?

Final Output

appointment_date patient_id practitioner_id appointment_duration_min revenues_from_appointment
2021-06-28 42734 748 30 90.0
2021-06-29 42737 747 60 150.0
2021-07-01 42737 747 60 150.0
2021-07-03 42736 748 30 60.0
2021-07-03 42735 747 15 42.62
2021-07-04 42734 748 30 95.0
2021-07-05 42734 748 30 100.0
2021-07-10 42738 747 15 50.72
2021-08-12 42739 748 30 73.43

CodePudding user response:

You can use the AVG window function, that will partition on the three column of interest and replace null values using the COALESCE function:

SELECT appointment_date,
       patient_id,
       practitioner_id,
       appointment_duration_min,
       COALESCE(revenues_from_appointment, 
                AVG(revenues_from_appointment) OVER(PARTITION BY patient_id, 
                                                                 practitioner_id, 
                                                                 appointment_duration_min))
FROM tab

Try it here.

  • Related