Home > front end >  Pandas counting number of rows based on data of two columns
Pandas counting number of rows based on data of two columns

Time:04-15

I am working on a dataset with format similar to this :-

Name      Sex     Survived      random_cols . . . .

Akshit    Male        1           rand_val   .......

Hema      Female      0           .................

Rekha      Female     1           .................
.
.
.

I want to count the number of Male and Female who Survived i.e have value 1 for the Survived column. I can do this easily with a naive approach of using counter but I was wondering if there is a way to do this in more efficient way with few lesser lines of code using pandas

m = 0
f = 0
for i in range(len(train_data['Sex'])):
    if train_data['Sex'][i] == 'male' and train_data['Survived'][i] == 1:
        m = m   1
    
    if train_data['Sex'][i] == 'female' and train_data['Survived'][i] == 1:
        f = f   1

print(m)
print(f)

CodePudding user response:

Use pandas.DataFrame.value_counts

train_data.value_counts(subset=['Sex', 'Survived'])

CodePudding user response:

You can use boolean indexing to filter by the Survived column to get only survived rows then value_counts on Sex column:

s = df[df['Survived'].eq(1)].value_counts(subset=['Sex'])
print(s)

Sex
Female    1
Male      1
dtype: int64

The return value is a pandas Series, you can access its value with

s['Male']
s['Female']
  • Related