I have an array of tuples that represents peoples' names and genders.
from enum import Enum
class Gender(Enum):
MALE = 1,
FEMALE = 2
people = np.array(
[
('John Smith', Gender.MALE),
('Samantha Wheeler', Gender.FEMALE),
]
I'm trying to filter them by gender like so:
guys = np.where(people[1] == Gender.MALE)
girls = np.where(people[1] == Gender.FEMALE)
Doesn't seem to work even though the condition seems fine. What am I doing wrong?
CodePudding user response:
You want check column 1
in any row
you need this [:,1]
like below:
>>> people = np.array([('John Smith', Gender.MALE),('Samantha Wheeler', Gender.FEMALE),
... ('John Smith', Gender.MALE),('Samantha Wheeler', Gender.FEMALE)])
>>> guys = np.where(people[:,1] == Gender.MALE)
>>> girls = np.where(people[:,1] == Gender.FEMALE)
>>> girls
(array([1, 3]),)
>>> people[girls][:,0]
array(['Samantha Wheeler', 'Samantha Wheeler'], dtype=object)
# second approach
>>> row_guys, columns_guys = np.where(people == Gender.MALE)
>>> people[row_guys][:,0]
array(['John Smith', 'John Smith'], dtype=object)
CodePudding user response:
NumPy isn't ideal for mixed dtype data like the one you have. You should be using Pandas instead:
import pandas as pd
df = pd.DataFrame({
'name': ['John Smith', 'Samantha Wheeler'],
'gender': ['male', 'female'],
})
# this step is optional and is basically analog to using an Enum
df['gender'] = df['gender'].astype('category')
print(df[df['gender'] == 'male'])
# name gender
# 0 John Smith male
print(df[df['gender'] == 'female'])
# name gender
# 1 Samantha Wheeler female