I got a dataframe looking like this :
DATE SCHEDULED ARTICLE_CODE
2021-10-01 2 001
2021-10-01 2 002
2021-10-01 2 003
2021-11-01 2 001
2021-11-01 2 002
2021-11-01 2 003
2021-11-01 2 004
...
What I would like is to get the percentage of the scheduled of earch article code for each date
What I'd like to get is :
DATE SCHEDULED ARTICLE_CODE PERCENTAGE
2021-10-01 2 001 33.3
2021-10-01 2 002 33.3
2021-10-01 2 003 33.3
2021-11-01 2 001 25
2021-11-01 2 002 25
2021-11-01 2 003 25
2021-11-01 2 004 25
...
I think it is doable with groupby but I can't manage to do it
CodePudding user response:
df['PERCENTAGE'] = df.groupby('DATE')['ARTICLE_CODE'].apply(lambda x: x*100/x.sum())
CodePudding user response:
Use Series.value_counts
with Series.map
for count column, divide by 100
from right side by Series.rdiv
:
df['PERCENTAGE'] = df['DATE'].map(df['DATE'].value_counts()).rdiv(100).round(1)
print (df)
DATE SCHEDULED ARTICLE_CODE PERCENTAGE
0 2021-10-01 2 1 33.3
1 2021-10-01 2 2 33.3
2 2021-10-01 2 3 33.3
3 2021-11-01 2 1 25.0
4 2021-11-01 2 2 25.0
5 2021-11-01 2 3 25.0
6 2021-11-01 2 4 25.0
Another idea:
df['PERCENTAGE'] = (100 / df.groupby('DATE')['ARTICLE_CODE'].transform('size')).round(1)
CodePudding user response:
Here's a way with pd.crosstab
-
df['PERCENTAGE'] = pd.crosstab(df['DATE'], df['ARTICLE_CODE'], normalize='index').stack().where(lambda x: x>0).dropna().round(2).reset_index()[0]
Output
DATE SCHEDULED ARTICLE_CODE PERCENTAGE
0 2021-10-01 2 1 0.33
1 2021-10-01 2 2 0.33
2 2021-10-01 2 3 0.33
3 2021-11-01 2 1 0.25
4 2021-11-01 2 2 0.25
5 2021-11-01 2 3 0.25
6 2021-11-01 2 4 0.25