Home > Net >  Converting a pandas dataframe into bool values
Converting a pandas dataframe into bool values

Time:03-15

I am trying to write a code that checks if there are any duplicates in the Unix column in the pandas series. It is supposed to return a bool value of True for DataSet since 130 is repeated twice and a False for DataSet2. How would I be able to get the expected output below?

import pandas as pd

DataSet = pd.DataFrame({'Unix':[130, 140, 150, 130],
                  'Value':[11,2,3,4]})
DataSet2 =  pd.DataFrame({'Unix':[130, 140, 150, 130],
                  'Value':[11,2,3,4]})
 
print(DataSet.duplicated(subset=['Unix']).bool())
print(DataSet.duplicated(subset=['Unix']).bool())

Expected Output:

True 
False

CodePudding user response:

Use .any() instead of .bool():

print(DataSet.duplicated(subset=['Unix']).any())
print(DataSet2.duplicated(subset=['Unix']).any())

Output:

True
True

(Note that DataSet and DataSet1 as given in the question are identical, thus True for both. I assume this is just a typo and does not reflect your actual data.)

CodePudding user response:

you can get DataSet True following code

print(DataSet["Unix"].duplicated().sum() > 0)

i don know why DataSet2 is False

CodePudding user response:

You just need to call .any() which returns True if there are any truthy values in the collection passed to it as an argument (conversely, .all() returns True only if all values in a given collection are truthy):

In [5]: DataSet.duplicated(subset=['Unix']).any()
Out[5]: True

In [6]: DataSet2.duplicated(subset=['Unix']).any()
Out[6]: True

However, I think you may have accidentally copied the wrong data for DataSet2 as it is identical to DataSet1 and a False answer wouldn't make sense.

Note that you can also use .is_unique, which is more appropriate for your use-case:

In [7]: DataSet.Unix.is_unique
Out[7]: False

In [8]: DataSet2.Unix.is_unique
Out[8]: False

For reference, the .any() and .all() functions are convenience methods that are logically equivalent to using or and and, respectively, over all the items in a collection.

  • Related