Data column contains values of new customer additions. However it contains negative values, it means that no. of customers left in that perticular week. What is the best way to replace negative values with positive values?
For instance, mean of previous values
Here is the sample
Week New customer added
Week 01 1,881
Week 02 7
Week 03 10
Week 04 10
Week 05 (22)
Week 06 (18)
Week 07 22
Week 08 25
Week 09 3
Week 10 (36)
Week 11 25
Week 12 20
Week 13 (7)
Week 14 (24)
Week 15 10
Week 16 29
Week 17 4
Week 18 (55)
Week 19 (3)
Week 20 17
Week 21 (1,875)
Week 22 1,867
Week 23 (13)
Week 24 47
Week 25 40
Week 26 (1,916)
Week 27 1,891
Week 28 0
Week 29 22
Current strategy is
df['New customer added'].mask(df['New customer added'].lt(0)).bfill().fillna(0)
I'm thinking of second strategy i.e. replacing negative number with mean of previous 2-3 positive values.
Expected output of second strategy is
Week New customer added
Week 01 1,881
Week 02 7
Week 03 10
Week 04 10
Week 05 9 # Negative no. replaced by mean of prev 3(i.e. 7,10,10) positive no. i.e. 9
Week 06 9 # Negative no. replaced by mean of prev 3(i.e. 7,10,10) positive no. i.e. 9
CodePudding user response:
Use:
#is sample data are ,, so replace to emoty string
df['New customer added'] = df['New customer added'].astype(str).str.replace(',','')
#negatives are in sample with (
m = df['New customer added'].astype(str).str.contains('\(')
Solution filter only positive rows and use rolling with mean
, last use DataFrame.reindex
by original index with forward filling missing values (for negatives rows)
new = (df.loc[~m, 'New customer added'].astype(float)
.rolling(3, min_periods=1).mean()
.reindex(df.index, method='ffill'))
And last replace only negatives:
df['New customer added'] = df['New customer added'].mask(m, new)
print (df)
Week New customer added
0 Week 01 1881
1 Week 02 7
2 Week 03 10
3 Week 04 10
4 Week 05 9.0
5 Week 06 9.0
6 Week 07 22
7 Week 08 25
8 Week 09 3
9 Week 10 16.666667
10 Week 11 25
11 Week 12 20
12 Week 13 16.0
13 Week 14 16.0
14 Week 15 10
15 Week 16 29
16 Week 17 4
17 Week 18 14.333333
18 Week 19 14.333333
19 Week 20 17
20 Week 21 16.666667
21 Week 22 1867
22 Week 23 629.333333
23 Week 24 47
24 Week 25 40
25 Week 26 651.333333
26 Week 27 1891
27 Week 28 0
28 Week 29 22