Home > database >  How to combine sum and conditional count in pandas
How to combine sum and conditional count in pandas

Time:08-03

I have a dataset which has Id , ActivityDate and TotalSteps as column header. Please find dataframe details below:

df = pd.DataFrame([[1844505072, 6847, '4/1/2016'], [1844505072, 5367, '4/2/2016'], [1844505072, 0, '4/3/2016'],[1844505072, 0, '4/4/2016']], columns=['Id', 'TotalSteps','ActivityDate'])


Id          TotalSteps  ActivityDate
1844505072  6847        4/1/2016
1844505072  5367        4/2/2016
1844505072  0           4/3/2016
1844505072  0           4/4/2016

Need to find the total steps taken by each id and also the count when total_steps = 0. In the above set , expected result is:

Id          TotalSteps    Zero_steps
1844505072  12214         3

Below code is written using pandas which gives two separate results. I want all results to be in a single dataframe.

df[df['Id']==1844505072].groupby('Id').agg(Sum_Tot_Steps=('TotalSteps','sum'))
df[df['Id']==1844505072].groupby('Id')['TotalSteps'].apply(lambda x:x[x==0].count())

Please provide any solution using python pandas. Thanks for your help.

CodePudding user response:

Maybe here are the code:

import pandas as pd

data = {
    "ID": ["1844505070","1844505070","1844505070","1844505071","1844505071","1844505071","1844505072","1844505072","1844505072"],
    "DATE": ["4/1/2016","4/2/2016","4/3/2016","4/1/2016","4/2/2016","4/3/2016","4/1/2016","4/2/2016","4/3/2016"],
    "STEPS": [6832,0,1329,0,0,539,1432,3198,7233],
}
df = pd.DataFrame(data)

Create sample data which is df:

enter image description here

Statistics on the data

id_list = df["ID"].unique().tolist()
total_steps_list = []
zero_steps_list = []

for i in id_list:
    tmp_df = df[df["ID"] == i]
    total_steps_list.append(sum(tmp_df["STEPS"]))
    zero_steps_list.append(len(tmp_df[tmp_df["STEPS"] == 0]))

result = {
    "ID": id_list,
    "TOTAL_STEPS": total_steps_list,
    "ZERO_STEPS": zero_steps_list
}
df2 = pd.DataFrame(result)

And the result is df2:

enter image description here

CodePudding user response:

use this code:

df = df.groupby(by=['id']).agg({'step': ['sum', lambda x: x.count(0)]}).reset_index()

also you can use from this:

import numpy as np
import pandas as pd
df['x'] = np.select([df.step == 0], [1], 0)
df = df.groupby(by=['id']).agg({'step': 'sum' , 'x': 'sum'}).reset_index()

CodePudding user response:

df = pd.DataFrame([[1844505072, 6847, '4/1/2016'], [1844505072, 5367, '4/2/2016'], [1844505072, 0, '4/3/2016'],[1844505072, 0, '4/4/2016']], columns=['Id', 'TotalSteps','ActivityDate'])
df.assign(Zero_steps = df.TotalSteps == 0).groupby('Id').sum()

Output:

            TotalSteps  Zero_steps
Id                                
1844505072       12214           2

CodePudding user response:

simple solution

Total_steps = df.groupby('Id')['Steps'].sum().reset_index(drop=True)
Zero_steps =  df[df['Steps']==0].groupby('Id')['Steps'].count().reset_index(drop=True)
ID = df.groupby('Id')['Steps'].sum().index

data1 = {
    'Id':ID,
    'Total_steps':Total_steps,
    'Zero_steps':Zero_steps
}

df2 = pd.DataFrame(data1)
  • Related