Home > Software engineering >  Filtered Group-How to group one column with and add calculated columns
Filtered Group-How to group one column with and add calculated columns

Time:02-10

DATASET I CURRENTLY HAVE-

COUNTRY city    id  tag dummy
India   ackno   1   2   1
China   open    0   0   1
India   ackno   1   2   1
China   open    0   0   1
USA     open    0   0   1
USA     open    0   0   1
China   ackno   1   2   1
USA     ackno   1   2   1
USA     resol   1   0   1
Russia  open    0   0   1
Italy   open    0   0   1
country=df['country'].unique().tolist()
city=['resol','ackno']


#below are the preferred filters for calculating column percentage
df_looped=df[(df['city'].isin(city)) & (df['id']!=0) | (df['tag']!=0)]
percentage=(df_looped/df)*100

df_summed=df.groupby(['COUNTRY']).agg({'COUNTRY':'count'})
summed=df_summed['COUNTRY'].sum(axis=0)

THE DATASET I WANT-

COUNTRY percentage  summed
india   100%          2
China   66.66%        3
USA     25%           4
Russia  0%            1
Italy   0%            1

percentage should be derived from the above formula for every unique country and same for the sum. percentage variable and summed variable should populate the columns.

CodePudding user response:

You can create helper column a by your conditions and for percentages of Trues use mean, for count values used GroupBy.size (because GroupBy.count omit misisng values and here no missing values) and last divide columns:

city=['resol','ackno']

df = (df.assign(a = (df['city'].isin(city) & (df['id']!=0) | (df['tag']!=0)))
        .groupby('COUNTRY', sort=False)
        .agg(percentage= ('a','mean'),summed=('a', 'size'))
        .assign(percentage = lambda x: x['percentage'].mul(100).round(2))
        )
print (df)
         percentage  summed
COUNTRY                    
India        100.00       2
China         33.33       3
USA           50.00       4
Russia         0.00       1
Italy          0.00       1

CodePudding user response:

You can use pivot_table with a dict of functions to apply to your dataframe. You have to assign before a new column with your conditions (looped):

funcs = {
    'looped': [
        ('percentage', lambda x: f"{round(sum(x) / len(x) * 100, 2)}%"),
        ('summed', 'size')
  ]
}

# Your code without df[...]
looped = (df['city'].isin(city)) & (df['id'] != 0) | (df['tag'] != 0)

out = df.assign(looped=looped).pivot_table('looped', 'COUNTRY', aggfunc=funcs)

Output:

>>> out
        percentage  summed
COUNTRY                   
China       33.33%       3
India       100.0%       2
Italy         0.0%       1
Russia        0.0%       1
USA          50.0%       4
  • Related