Home > OS >  How to calculate percentage of certain value in column?
How to calculate percentage of certain value in column?

Time:10-01

I have a table:

country ID type
US a1 upload
US a1 delete
US a1 delete
US a2 upload
US a2 upload
US a2 delete
IT k2 upload
IT k2 upload
IT k2 delete

I need to get data frame with unique countries, count number of unique ID's in it and average percent of "delete" value in type among each ID in country. So desired table must look like:

country ID_cnt delete_perc
US 2 0.5
IT 1 0.33

Its 0.498 since for ID a1, its 0.66; and for a2, it's 0.33; s (0.66 0.33)/2 = 0.5

How to do that with Python?

CodePudding user response:

Use groupby with agg:

>>> df.assign(delete_perc=df['type'].eq('delete')).groupby('country', as_index=False).agg({'ID': 'nunique', 'delete_perc': 'mean'})
  country  ID  delete_perc
0      IT   1     0.333333
1      US   2     0.500000
>>> 

Or possible with value_counts and normalize=True:

>>> df.groupby('country', as_index=False).agg(ID=('ID', 'nunique'), delete_perc=('type', lambda x: x.value_counts(normalize=True)['delete']))
  country  ID  delete_perc
0      IT   1     0.333333
1      US   2     0.500000
>>> 

CodePudding user response:

For table manipulation in Python, we often use the pandas package. You first need to create a pandas.DataFrame object and then, for your use case, use the groupby and agg operations.

There is a complete example:

from io import StringIO
import pandas as pd

data = """country ID type
US a1 upload
US a1 delete
US a1 delete
US a2 upload
US a2 upload
US a2 delete
IT k2 upload
IT k2 upload
IT k2 delete
"""

df = pd.read_csv(StringIO(data), sep=" ")

country_stats = (
    df
    .groupby('country', as_index=False)
    .agg(
        ID=('ID', 'nunique'),
        delete_perc=('type', lambda x: x.value_counts(normalize=True)['delete']))
)

The country_stats DataFrame now contains the desired output:

    country ID  delete_perc
0   IT  1   0.333333
1   US  2   0.500000
  • Related