This is just an oversimplification but I have this large categorical data.
Name Age Gender
John 12 Male
Ana 24 Female
Dave 16 Female
Cynthia 17 Non-Binary
Wayne 26 Male
Hebrew 29 Non-Binary
Suppose that it is assigned as df
and I want it to return as a list with non-duplicate values:
'Male','Female','Non-Binary'
I tried it with this code, but this returns the gender with duplicates
list(df['Gender'])
How can I code it in pandas so that it can return values without duplicates?
CodePudding user response:
In these cases you have to remember that df["Gender"]
is a Pandas Series so you could use .drop_duplicates()
to retrieve another Pandas Series with the duplicated values removed or use .unique()
to retrieve a Numpy Array containing the unique values.
>> df["Gender"].drop_duplicates()
0 Male
1 Female
3 Non-Binary
4 Male
Name: Gender, dtype: object
>> df["Gender"].unique()
['Male ' 'Female' 'Non-Binary' 'Male']