Filtered Group-How to group one column with and add calculated columns-CodePudding

DATASET I CURRENTLY HAVE-

COUNTRY city    id  tag dummy
India   ackno   1   2   1
China   open    0   0   1
India   ackno   1   2   1
China   open    0   0   1
USA     open    0   0   1
USA     open    0   0   1
China   ackno   1   2   1
USA     ackno   1   2   1
USA     resol   1   0   1
Russia  open    0   0   1
Italy   open    0   0   1

country=df['country'].unique().tolist()
city=['resol','ackno']


#below are the preferred filters for calculating column percentage
df_looped=df[(df['city'].isin(city)) & (df['id']!=0) | (df['tag']!=0)]
percentage=(df_looped/df)*100

df_summed=df.groupby(['COUNTRY']).agg({'COUNTRY':'count'})
summed=df_summed['COUNTRY'].sum(axis=0)

THE DATASET I WANT-

COUNTRY percentage  summed
india   100%          2
China   66.66%        3
USA     25%           4
Russia  0%            1
Italy   0%            1

percentage should be derived from the above formula for every unique country and same for the sum. percentage variable and summed variable should populate the columns.

CodePudding user response：

You can create helper column a by your conditions and for percentages of Trues use mean, for count values used GroupBy.size (because GroupBy.count omit misisng values and here no missing values) and last divide columns:

city=['resol','ackno']

df = (df.assign(a = (df['city'].isin(city) & (df['id']!=0) | (df['tag']!=0)))
        .groupby('COUNTRY', sort=False)
        .agg(percentage= ('a','mean'),summed=('a', 'size'))
        .assign(percentage = lambda x: x['percentage'].mul(100).round(2))
        )
print (df)
         percentage  summed
COUNTRY                    
India        100.00       2
China         33.33       3
USA           50.00       4
Russia         0.00       1
Italy          0.00       1

CodePudding user response：

You can use pivot_table with a dict of functions to apply to your dataframe. You have to assign before a new column with your conditions (looped):

funcs = {
    'looped': [
        ('percentage', lambda x: f"{round(sum(x) / len(x) * 100, 2)}%"),
        ('summed', 'size')
  ]
}

# Your code without df[...]
looped = (df['city'].isin(city)) & (df['id'] != 0) | (df['tag'] != 0)

out = df.assign(looped=looped).pivot_table('looped', 'COUNTRY', aggfunc=funcs)

Output:

>>> out
        percentage  summed
COUNTRY                   
China       33.33%       3
India       100.0%       2
Italy         0.0%       1
Russia        0.0%       1
USA          50.0%       4