How to get results without duplicates ? I have a CSV file shown in the description below, I take all the columns together, to get a random result from all the columns together, but I get results with duplicates.
Thank you
test.csv
d c b a
-----------------
0 Q A A Q
1 K A K 8
2 8 10 8 10
import pandas as pd
import numpy as np
df = pd.read_csv('test.csv', usecols=['a','b','c','d'])
df = np.array(df.iloc[0:3])
np.random.seed(2)
print(np.random.choice(df.flatten(), size=(20, 2)))
results
[['8' '8'] # duplicate
['K' '10']
['A' '10']
['8' '8'] # duplicate
['A' 'A']
['10' 'A'] # duplicate
['8' 'K'] # duplicate
['K' 'A']
['8' 'Q']
['K' 'K']
['8' '10']
['Q' '8']
['K' '8']
['A' '8']
['Q' 'A']
['8' 'K'] # duplicate
['K' 'Q']
['10' 'A'] # duplicate
['Q' 'K']
['A' 'K']]
CodePudding user response:
Use DataFrame.duplicated
with invert mask by ~
:
a = np.random.choice(df.flatten(), size=(20, 2))
a2 = a[~pd.DataFrame(a).duplicated(keep=False)]
print (a2)
['Q' '8']
['K' '8']
['A' '8']
['Q' 'A']
['K' 'Q']
['Q' 'K']
['A' 'K']]
Details:
print (pd.DataFrame(a).duplicated(keep=False))
0 True
1 False
2 False
3 True
4 False
5 True
6 True
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 True
16 False
17 True
18 False
19 False
dtype: bool