Home > Net >  How to get results without duplicates array?
How to get results without duplicates array?

Time:04-30

How to get results without duplicates ? I have a CSV file shown in the description below, I take all the columns together, to get a random result from all the columns together, but I get results with duplicates.

Thank you

test.csv
    d   c   b   a
-----------------
0   Q   A   A   Q
1   K   A   K   8
2   8   10  8   10
import pandas as pd   
import numpy as np
df = pd.read_csv('test.csv', usecols=['a','b','c','d'])
df = np.array(df.iloc[0:3])
np.random.seed(2) 
print(np.random.choice(df.flatten(), size=(20, 2)))

results

[['8' '8'] # duplicate
 ['K' '10']
 ['A' '10']
 ['8' '8']  # duplicate
 ['A' 'A']
 ['10' 'A'] # duplicate 
 ['8' 'K']  # duplicate
 ['K' 'A']
 ['8' 'Q']
 ['K' 'K']
 ['8' '10']
 ['Q' '8']
 ['K' '8']
 ['A' '8']
 ['Q' 'A']
 ['8' 'K']  # duplicate
 ['K' 'Q']
 ['10' 'A'] # duplicate
 ['Q' 'K']
 ['A' 'K']] 

CodePudding user response:

Use DataFrame.duplicated with invert mask by ~:

a = np.random.choice(df.flatten(), size=(20, 2))

a2 = a[~pd.DataFrame(a).duplicated(keep=False)]
print (a2)
 ['Q' '8']
 ['K' '8']
 ['A' '8']
 ['Q' 'A']
 ['K' 'Q']
 ['Q' 'K']
 ['A' 'K']]

Details:

print (pd.DataFrame(a).duplicated(keep=False))
0      True
1     False
2     False
3      True
4     False
5      True
6      True
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15     True
16    False
17     True
18    False
19    False
dtype: bool
  • Related