unique count for accumulative values in 2 or more rows pandas-CodePudding

Is it possible to do a unique count for the values of 2(or multiple) rows in a dataframe? I was able to do unique count with df['count'] = df.iloc[:, 0:6].nunique(axis=1) of the first 6 columns of individual. However, i can't figure out (or find) how to get the unique count of the 6 columns in both(or multiple) rows.

original df: there are 3 unique values in each row: 7,4,2 and 8,5,6

╔═════╦══════╦══════╦══════╦═════╦═════════╦════════╦═══════╗
║ hf0 ║ hf1  ║ hf2  ║ hf3  ║ hf4 ║ hf5     ║ sample ║ count ║
╠═════╬══════╬══════╬══════╬═════╬═════════╬════════╬═══════╣
║   7 ║    4 ║    2 ║    2 ║   7 ║       2 ║ 7yr    ║     3 ║
║   8 ║    5 ║    5 ║    6 ║   5 ║       6 ║ 7yr    ║     3 ║
╚═════╩══════╩══════╩══════╩═════╩═════════╩════════╩═══════╝

df trying to get: there are 6 unique values for both rows: 7,4,2,8,5,6

╔═════╦══════╦══════╦══════╦═════╦═════════╦════════╦════════╦════════════╗
║ hf0 ║ hf1  ║ hf2  ║ hf3  ║ hf4 ║ hf5     ║ sample ║ count  ║ count2rows ║
╠═════╬══════╬══════╬══════╬═════╬═════════╬════════╬════════╬════════════╣
║   7 ║    4 ║    2 ║    2 ║   7 ║       2 ║ 7yr    ║      3 ║          6 ║
║   8 ║    5 ║    5 ║    6 ║   5 ║       6 ║ 7yr    ║      3 ║          6 ║
╚═════╩══════╩══════╩══════╩═════╩═════════╩════════╩════════╩════════════╝

code for sample df:

import pandas as pd
import numpy as np
data = {'hf0':[7,8],'hf1':[4,5], 'hf2':[2,5],'hf3':[2,6],'hf4':[7,5],'hf5':[2,6],'sample':['7yr','7yr']}
df = pd.DataFrame(data)

df['count'] = df.iloc[:, 0:6].nunique(axis=1)
df

Thanks in advance

CodePudding user response：

df
###
   hf0  hf1  hf2  hf3  hf4  hf5 sample
0    7    4    2    2    7    2    7yr
1    8    5    5    6    5    6    7yr
2    9    6    8   10    7   10    7yr
3   10    7   11   14    5   14    7yr
4   11    8   14   18    7   18    7yr

rolling window rolling=2

ar = df.loc[:,'hf0':'hf5'].to_numpy()
rolling = 2
length = ar.shape[1]

cubic = np.lib.stride_tricks.sliding_window_view(ar, (rolling,length))
plan = cubic.reshape(-1,rolling*length).astype(float)

head_arr = plan[0,:length]
head_arr = np.pad(head_arr.astype(float), (0,length), 'constant', constant_values=np.nan)

plan = np.insert(plan, 0, head_arr, axis=0)
df['count2rows'] = pd.DataFrame(plan).nunique(axis=1)
df
###
   hf0  hf1  hf2  hf3  hf4  hf5 sample  count2rows
0    7    4    2    2    7    2    7yr           3
1    8    5    5    6    5    6    7yr           6
2    9    6    8   10    7   10    7yr           6
3   10    7   11   14    5   14    7yr           8
4   11    8   14   18    7   18    7yr           7

CodePudding user response：

You could use Numpy to get that done.

import numpy as np

df['count2rows'] = len(np.unique(df.filter(like='hf').values))
print(df)

Result

   hf0  hf1  hf2  hf3  hf4  hf5 sample  count  count3rows
0    7    4    2    2    7    2    7yr      3           6
1    8    5    5    6    5    6    7yr      3           6

CodePudding user response：

You could use stack()

df['count_2 rows'] = df.iloc[:, 0:6].stack().nunique()