DATASET I CURRENTLY HAVE-
COUNTRY city id tag dummy
India ackno 1 2 1
China open 0 0 1
India ackno 1 2 1
China open 0 0 1
USA open 0 0 1
USA open 0 0 1
China ackno 1 2 1
USA ackno 1 2 1
USA resol 1 0 1
Russia open 0 0 1
Italy open 0 0 1
country=df['country'].unique().tolist()
city=['resol','ackno']
#below are the preferred filters for calculating column percentage
df_looped=df[(df['city'].isin(city)) & (df['id']!=0) | (df['tag']!=0)]
percentage=(df_looped/df)*100
df_summed=df.groupby(['COUNTRY']).agg({'COUNTRY':'count'})
summed=df_summed['COUNTRY'].sum(axis=0)
THE DATASET I WANT-
COUNTRY percentage summed
india 100% 2
China 66.66% 3
USA 25% 4
Russia 0% 1
Italy 0% 1
percentage should be derived from the above formula for every unique country and same for the sum. percentage variable and summed variable should populate the columns.
CodePudding user response:
You can create helper column a
by your conditions and for percentages of True
s use mean
, for count values used GroupBy.size
(because GroupBy.count
omit misisng values and here no missing values) and last divide columns:
city=['resol','ackno']
df = (df.assign(a = (df['city'].isin(city) & (df['id']!=0) | (df['tag']!=0)))
.groupby('COUNTRY', sort=False)
.agg(percentage= ('a','mean'),summed=('a', 'size'))
.assign(percentage = lambda x: x['percentage'].mul(100).round(2))
)
print (df)
percentage summed
COUNTRY
India 100.00 2
China 33.33 3
USA 50.00 4
Russia 0.00 1
Italy 0.00 1
CodePudding user response:
You can use pivot_table
with a dict of functions to apply to your dataframe. You have to assign before a new column with your conditions (looped
):
funcs = {
'looped': [
('percentage', lambda x: f"{round(sum(x) / len(x) * 100, 2)}%"),
('summed', 'size')
]
}
# Your code without df[...]
looped = (df['city'].isin(city)) & (df['id'] != 0) | (df['tag'] != 0)
out = df.assign(looped=looped).pivot_table('looped', 'COUNTRY', aggfunc=funcs)
Output:
>>> out
percentage summed
COUNTRY
China 33.33% 3
India 100.0% 2
Italy 0.0% 1
Russia 0.0% 1
USA 50.0% 4