there are several answers around rolling count in pandas Rolling unique value count in pandas How to efficiently compute a rolling unique count in a pandas time series?
How do I count unique values across multiple columns? For one column, I can do:
df[my_col]=df[my_col].rolling(300).apply(lambda x: len(np.unique(x)))
How to extend to multipe columns, counting unique values overall across all values in the rolling window?
CodePudding user response:
Inside a list comprehension iterate over the rolling windows and for each window flatten the values in required columns then use set to get the distinct elements
cols = [...] # define your cols here
df['count'] = [len(set(w[cols].values.ravel())) for w in df.rolling(300)]
CodePudding user response:
I took a dataframe as a example (3-rows rolling window taking into account all the columns at the same time)
Dataframe for visualization
col1 col2 col3
0 1 1 1
1 1 1 4
2 2 5 2
3 3 3 3
4 3 7 3
5 5 3 9
6 8 8 2
Proposed script for checkings
import pandas as pd
df = pd.DataFrame({'col1':[1, 1, 2, 3, 3, 5, 8],
'col2':[1, 1, 5, 3, 7, 3, 8],
'col3':[1, 4, 2, 3, 3, 9, 2],})
df['count'] = df.rolling(3).apply(lambda w: len(set(df.iloc[w.index].to_numpy().flatten())))['col1']
print(df)
Output
col1 col2 col3 count
0 1 1 1 NaN
1 1 1 4 NaN
2 2 5 2 4.0
3 3 3 3 5.0
4 3 7 3 4.0
5 5 3 9 4.0
6 8 8 2 6.0