Home > Enterprise >  replace specific value in pandas dataframes using the mean between 10 previous and next values
replace specific value in pandas dataframes using the mean between 10 previous and next values

Time:10-25

Let's say I have the following dataframe

df.Consumption

0        16.208
1        11.193
2         9.845
3         9.348
4         9.091
          ...  
19611     0.000
19612     0.000
19613     0.000
19614     0.000
19615     0.000
Name: Consumption, Length: 19616, dtype: float64

I want to replace the 0 values with the mean of the 10 previous and next values that are not 0.00

What is a good way to do it? I was thinking about using the replace and interpolate methods but I can't see how to write it efficiently

CodePudding user response:

This should get you pretty close. It takes advantage of the null values not being counted in the mean, so you can replace zero with nan and then just loop through.

I'm not sure of a better way to do with without a row-wise apply.

Something tells me that doing an actual loop, where you update the df each iteration will give you slightly different results, as you will be imputing nulls as you go, which will make the prior 10 results always have a value.

import pandas as pd
df = pd.DataFrame({'Consumption':[1,1,1,1,1,1,1,1,1,0,2,2,2,2,2,2,2,2,2,2]})
df.replace(0,np.nan, inplace=True)
df.update(df.apply(lambda x:np.mean(df.Consumption.iloc[max(x.name-10,0):]), axis=1).to_frame('Consumption'),overwrite=False)

Output

Consumption
0   1.000000
1   1.000000
2   1.000000
3   1.000000
4   1.000000
5   1.000000
6   1.000000
7   1.000000
8   1.000000
9   1.526316
10  2.000000
11  2.000000
12  2.000000
13  2.000000
14  2.000000
15  2.000000
16  2.000000
17  2.000000
18  2.000000
19  2.000000

CodePudding user response:

You can use Series.rolling() with center=True together with Rolling.mean() to get the mean of previous and next values.

Replace 0 by NaN if you want to exclude 0 from the mean calculation.

Set center=True so that the rolling windows look for both previous and next entries.

Finally, set those entries with value 0 with the mean values by using .loc, as follows:

n = 10     # check previous and next 10 entries

# rolling window size is (2n   1)
Consumption_mean = (df['Consumption'].replace(0, np.nan)
                                     .rolling(n * 2   1, min_periods=1, center=True)
                                     .mean())

df.loc[df['Consumption'] == 0, 'Consumption'] = Consumption_mean

Demo

Using smaller window size n = 3 to demonstrate:

df


    Consumption
0        16.208
1        11.193
2         9.845
3         9.348
4         9.091
5         8.010
6         0.000              <====   target entry
7         7.100
8         0.000              <====   target entry
9         6.800
10        6.500
11        6.300
12        5.900
13        5.800
14        5.600

#n = 10     # check previous and next 10 entries
n = 3     # smaller window size for demo

# rolling window size is (2n   1)
Consumption_mean = (df['Consumption'].replace(0, np.nan)
                                     .rolling(n * 2   1, min_periods=1, center=True)
                                     .mean())

# Update into a new column `Consumption_New` for demo purpose
df['Consumption_New'] = df['Consumption']    
df.loc[df['Consumption'] == 0, 'Consumption_New'] = Consumption_mean

Demo Result:

print(df)

    Consumption  Consumption_New
0        16.208          16.2080
1        11.193          11.1930
2         9.845           9.8450
3         9.348           9.3480
4         9.091           9.0910
5         8.010           8.0100
6         0.000           8.0698   # 8.0698 = (9.348   9.091   8.01   7.1   6.8) / 5 with skipping 0.000 between 7.100 and 6.800
7         7.100           7.1000
8         0.000           6.9420   # 6.942 = (8.01   7.1   6.8   6.5   6.3) / 5 with skipping 0.000 between 8.010 and 7.100
9         6.800           6.8000
10        6.500           6.5000
11        6.300           6.3000
12        5.900           5.9000
13        5.800           5.8000
14        5.600           5.6000
  • Related