I have a column in my data frame with a category. I know how to do a rolling mean on a column with float values:
df['rolling_mean'] = df.categorycolumn.rolling(10).mean()
But I do not have numbers in this column but letters, for example:
A
A
A
A
D
D
D
D
D
S
D
D
D
D
D
And I want to get rid of the S, so I want to try to make a rolling average so it will be less sensitive for anomalies. Does anyone know how to do this?
Greetings and enjoy the sun (if there is any)!
p.s. Actually I am not looking for the mean but for the mode, the most common category value over a certain window.
So my data frame looks like this:
id category
1 A
2 A
3 A
4 A
5 A
6 A
7 S
8 S
9 A
10 A
11 A
12 A
13 A
14 A
15 A
17 A
18 A
19 A
20 A
And I wont it to find the mode of a rolling window like 10, so the full column would become only A (and no S)
CodePudding user response:
Maybe you can use pd.factorize
:
# Detection
s = pd.Series(pd.factorize(df['col1'])[0])
m = s.sub(s.rolling(10, center=True, min_periods=1).median()).abs().ge(1)
print(df[m])
# Output
col1
9 S
Update
If you are looking for mode
, try:
value, letter = pd.factorize(df['category'])
df['newcat'] = (pd.Series(value).rolling(10, center=True, min_periods=1)
.apply(lambda x: x.mode()).map(pd.Series(letter)))
print(df)
# Output
id category newcat
0 1 A A
1 2 A A
2 3 A A
3 4 A A
4 5 A A
5 6 A A
6 7 S A # HERE
7 8 S A # HERE
8 9 A A
9 10 A A
10 11 A A
11 12 A A
12 13 A A
13 14 A A
14 15 A A
15 17 A A
16 18 A A
17 19 A A
18 20 A A