I have a df like this:
month | outcome | mom.ret |
---|---|---|
10/20 | winner | 0.2 |
10/20 | winner | 0.9 |
11/20 | winner | 0.6 |
11/20 | winner | 0.2 |
11/20 | winner | 0.9 |
10/20 | loser | 0.6 |
10/20 | loser | 0.2 |
10/20 | loser | 0.9 |
11/20 | loser | 0.6 |
I would like to add another column, which has 1 / by the counts of times the value "winner" or "loser" appears per each month on the column outcome. The expected output for the example df is:
month | outcome | mom.ret | q |
---|---|---|---|
10/20 | winner | 0.2 | 1/2 |
10/20 | winner | 0.9 | 1/2 |
11/20 | winner | 0.6 | 1/3 |
11/20 | winner | 0.2 | 1/3 |
11/20 | winner | 0.9 | 1/3 |
10/20 | loser | 0.6 | 1/3 |
10/20 | loser | 0.2 | 1/3 |
10/20 | loser | 0.9 | 1/3 |
11/20 | loser | 0.6 | 1/1 |
I thought of using the function count to count how many times the values are repeated, but then I need to specify that the "count" should be done per each date. Any ideas?
CodePudding user response:
You can use this code to achieve what you want, assuming your original DataFrame is called df
:
counts = df.groupby(['month', 'outcome'], as_index=False).count()
counts = counts.rename(columns={'mom.ret': 'q'})
# Use this line if you want the float value of the division 0.5
# counts['q'] = 1/counts['q']
# Use this line if you want the string '1/2'
counts['q'] = counts['q'].apply(lambda x: f'1/{x}')
result = pd.merge(df, counts)
The result looks like this:
month outcome mom.ret q
0 10/20 winner 0.2 1/2
1 10/20 winner 0.9 1/2
2 11/20 winner 0.6 1/3
3 11/20 winner 0.2 1/3
4 11/20 winner 0.9 1/3
5 10/20 loser 0.6 1/2
6 10/20 loser 0.2 1/2
7 11/20 loser 0.9 1/2
8 11/20 loser 0.6 1/2
CodePudding user response:
Use df['q'] = 1/df.groupby(['month', 'outcome']).transform('count')
.
CodePudding user response:
Both of of the two solutions offered so far were close, but neither worked for me out of the box.
@timgeb's required a groupby of just the month for me. To output a fraction instead of a decimal, I used the handy humanize library.
import humanize # pip install humanize # if needed
df['q'] = 1 / df.groupby(['month']).transform('count')
df['q'] = df['q'].apply(lambda x : humanize.fractional(x))
Note that you can't just use .count()
with the groupby - you need the transform method to return a Series of the same length as the original DataFrame.
@sunnytown's solution required a modification to the apply and the merge for me.
Here's my version:
counts = df.groupby('month', as_index=False).count()
counts['q'] = "1"
counts['q'] = counts['outcome'].apply(lambda x: f'1/{x}')
result = pd.merge(df, counts[['month', 'q']], on='month'
Using Python 3.9.7, pandas 1.4.1
Code to make the original df (I left out the mom.ret column that wasn't germane).
import pandas as pd
df = pd.DataFrame(
{
"month": [
"10/20",
"10/20",
"11/20",
"11/20",
"11/20",
"10/20",
"10/20",
"10/20",
"11/20",
],
"outcome": [
"winner",
"winner",
"winner",
"winner",
"winner",
"loser",
"loser",
"loser",
"loser",
],
}
)