Is it possible to do a unique count for the values of 2(or multiple) rows in a dataframe? I was able to do unique count with df['count'] = df.iloc[:, 0:6].nunique(axis=1)
of the first 6 columns of individual. However, i can't figure out (or find) how to get the unique count of the 6 columns in both(or multiple) rows.
original df: there are 3 unique values in each row: 7,4,2 and 8,5,6
╔═════╦══════╦══════╦══════╦═════╦═════════╦════════╦═══════╗ ║ hf0 ║ hf1 ║ hf2 ║ hf3 ║ hf4 ║ hf5 ║ sample ║ count ║ ╠═════╬══════╬══════╬══════╬═════╬═════════╬════════╬═══════╣ ║ 7 ║ 4 ║ 2 ║ 2 ║ 7 ║ 2 ║ 7yr ║ 3 ║ ║ 8 ║ 5 ║ 5 ║ 6 ║ 5 ║ 6 ║ 7yr ║ 3 ║ ╚═════╩══════╩══════╩══════╩═════╩═════════╩════════╩═══════╝
df trying to get: there are 6 unique values for both rows: 7,4,2,8,5,6
╔═════╦══════╦══════╦══════╦═════╦═════════╦════════╦════════╦════════════╗ ║ hf0 ║ hf1 ║ hf2 ║ hf3 ║ hf4 ║ hf5 ║ sample ║ count ║ count2rows ║ ╠═════╬══════╬══════╬══════╬═════╬═════════╬════════╬════════╬════════════╣ ║ 7 ║ 4 ║ 2 ║ 2 ║ 7 ║ 2 ║ 7yr ║ 3 ║ 6 ║ ║ 8 ║ 5 ║ 5 ║ 6 ║ 5 ║ 6 ║ 7yr ║ 3 ║ 6 ║ ╚═════╩══════╩══════╩══════╩═════╩═════════╩════════╩════════╩════════════╝
code for sample df:
import pandas as pd
import numpy as np
data = {'hf0':[7,8],'hf1':[4,5], 'hf2':[2,5],'hf3':[2,6],'hf4':[7,5],'hf5':[2,6],'sample':['7yr','7yr']}
df = pd.DataFrame(data)
df['count'] = df.iloc[:, 0:6].nunique(axis=1)
df
Thanks in advance
CodePudding user response:
df
###
hf0 hf1 hf2 hf3 hf4 hf5 sample
0 7 4 2 2 7 2 7yr
1 8 5 5 6 5 6 7yr
2 9 6 8 10 7 10 7yr
3 10 7 11 14 5 14 7yr
4 11 8 14 18 7 18 7yr
rolling window rolling=2
ar = df.loc[:,'hf0':'hf5'].to_numpy()
rolling = 2
length = ar.shape[1]
cubic = np.lib.stride_tricks.sliding_window_view(ar, (rolling,length))
plan = cubic.reshape(-1,rolling*length).astype(float)
head_arr = plan[0,:length]
head_arr = np.pad(head_arr.astype(float), (0,length), 'constant', constant_values=np.nan)
plan = np.insert(plan, 0, head_arr, axis=0)
df['count2rows'] = pd.DataFrame(plan).nunique(axis=1)
df
###
hf0 hf1 hf2 hf3 hf4 hf5 sample count2rows
0 7 4 2 2 7 2 7yr 3
1 8 5 5 6 5 6 7yr 6
2 9 6 8 10 7 10 7yr 6
3 10 7 11 14 5 14 7yr 8
4 11 8 14 18 7 18 7yr 7
CodePudding user response:
You could use Numpy
to get that done.
import numpy as np
df['count2rows'] = len(np.unique(df.filter(like='hf').values))
print(df)
Result
hf0 hf1 hf2 hf3 hf4 hf5 sample count count3rows
0 7 4 2 2 7 2 7yr 3 6
1 8 5 5 6 5 6 7yr 3 6
CodePudding user response:
You could use stack()
df['count_2 rows'] = df.iloc[:, 0:6].stack().nunique()