Home > OS >  how to get the count of 1's and 0's (which are the values of col2) for each type of value
how to get the count of 1's and 0's (which are the values of col2) for each type of value

Time:10-13

col1 contains the names of fields such as NAME, CITY and STATE. col2 contains the values 1 and 0 for each value of col1, col3 contains the values ranging from 50 to 100

The following is the representation:

col1    col2    col3
---------------------
NAME    1        96
NAME    1        99
CITY    0        69
CITY    1        97
STATE   1        98
STATE   0        59

The desired output is that I need to count the number of 1s and 0s for each type of value of col1 that are greater than or equal to 95 in col3. the following could give the count for all the values of col1, that is, NAME, CITY, STATE. but the requirement is to get the count for each field.

df['col3'].value_counts()[0])

The following piece of code will filter for the condition that it should have the value above 95.

filt = (df['col1'] == 'NAME') | (df['col3'] > 95)

This is what I've tried. Please let me know if anyone can help me with the desired output. Thank you.

CodePudding user response:

Firstly, I would use .loc[] to filter on the rows where col3 is greater than 95:

df.loc[df['col3'] > 95]

    col1  col2  col3
0   NAME     1    96
1   NAME     1    99
3   CITY     1    97
4  STATE     1    98

Then use groupby() on this, grouping by col1 and col2 using count():

df.loc[df['col3'] > 95].groupby(['col1', 'col2']).count()

            col3
col1  col2      
CITY  1        1
NAME  1        2
STATE 1        1

CodePudding user response:

If need count 0 if no match create helper boolean column and aggregate sum for count Trues values:

df1 = (df.assign(new = df['col3'] > 95)
         .groupby(['col1','col2'])['new']
         .sum()
         .reset_index(name='count'))
print (df1)
    col1  col2  count
0   CITY     0      0
1   CITY     1      1
2   NAME     1      2
3  STATE     0      0
4  STATE     1      1

But if dont need 0 counts first filter and then aggregate GroupBy.size:

df2 = df[df['col3'] > 95].groupby(['col1','col2']).size().reset_index(name='count')
print (df2)
    col1  col2  count
0   CITY     1      1
1   NAME     1      2
2  STATE     1      1

CodePudding user response:

Another possible solution:

df.where(df['col3'].gt(95)).groupby('col1', as_index=False)['col2'].sum()

Output:

    col1  col2
0   CITY   1.0
1   NAME   2.0
2  STATE   1.0
  • Related