Home > database >  rolling unique value count in pandas across multiple columns
rolling unique value count in pandas across multiple columns

Time:02-05

there are several answers around rolling count in pandas Rolling unique value count in pandas How to efficiently compute a rolling unique count in a pandas time series?

How do I count unique values across multiple columns? For one column, I can do:

df[my_col]=df[my_col].rolling(300).apply(lambda x: len(np.unique(x)))

How to extend to multipe columns, counting unique values overall across all values in the rolling window?

CodePudding user response:

Inside a list comprehension iterate over the rolling windows and for each window flatten the values in required columns then use set to get the distinct elements

cols = [...] # define your cols here
df['count'] = [len(set(w[cols].values.ravel())) for w in df.rolling(300)]

CodePudding user response:

I took a dataframe as a example (3-rows rolling window taking into account all the columns at the same time)

Dataframe for visualization

   col1  col2  col3
0     1     1     1
1     1     1     4
2     2     5     2
3     3     3     3
4     3     7     3
5     5     3     9
6     8     8     2

Proposed script for checkings

import pandas as pd

df = pd.DataFrame({'col1':[1, 1, 2, 3, 3, 5, 8],
                   'col2':[1, 1, 5, 3, 7, 3, 8],
                   'col3':[1, 4, 2, 3, 3, 9, 2],})

df['count'] = df.rolling(3).apply(lambda w: len(set(df.iloc[w.index].to_numpy().flatten())))['col1']

print(df)

Output

   col1  col2  col3  count
0     1     1     1    NaN
1     1     1     4    NaN
2     2     5     2    4.0
3     3     3     3    5.0
4     3     7     3    4.0
5     5     3     9    4.0
6     8     8     2    6.0
  • Related