I am working on a dataset with format similar to this :-
Name Sex Survived random_cols . . . .
Akshit Male 1 rand_val .......
Hema Female 0 .................
Rekha Female 1 .................
.
.
.
I want to count the number of Male
and Female
who Survived
i.e have value 1
for the Survived
column. I can do this easily with a naive approach of using counter but I was wondering if there is a way to do this in more efficient way with few lesser lines of code using pandas
m = 0
f = 0
for i in range(len(train_data['Sex'])):
if train_data['Sex'][i] == 'male' and train_data['Survived'][i] == 1:
m = m 1
if train_data['Sex'][i] == 'female' and train_data['Survived'][i] == 1:
f = f 1
print(m)
print(f)
CodePudding user response:
Use pandas.DataFrame.value_counts
train_data.value_counts(subset=['Sex', 'Survived'])
CodePudding user response:
You can use boolean indexing to filter by the Survived
column to get only survived rows then value_counts
on Sex
column:
s = df[df['Survived'].eq(1)].value_counts(subset=['Sex'])
print(s)
Sex
Female 1
Male 1
dtype: int64
The return value is a pandas Series, you can access its value with
s['Male']
s['Female']