Filling a column with the amount of duplicated values in another column-CodePudding

I have a df like this:

month	outcome	mom.ret
10/20	winner	0.2
10/20	winner	0.9
11/20	winner	0.6
11/20	winner	0.2
11/20	winner	0.9
10/20	loser	0.6
10/20	loser	0.2
10/20	loser	0.9
11/20	loser	0.6

I would like to add another column, which has 1 / by the counts of times the value "winner" or "loser" appears per each month on the column outcome. The expected output for the example df is:

month	outcome	mom.ret	q
10/20	winner	0.2	1/2
10/20	winner	0.9	1/2
11/20	winner	0.6	1/3
11/20	winner	0.2	1/3
11/20	winner	0.9	1/3
10/20	loser	0.6	1/3
10/20	loser	0.2	1/3
10/20	loser	0.9	1/3
11/20	loser	0.6	1/1

I thought of using the function count to count how many times the values are repeated, but then I need to specify that the "count" should be done per each date. Any ideas?

CodePudding user response：

You can use this code to achieve what you want, assuming your original DataFrame is called df:

counts = df.groupby(['month', 'outcome'], as_index=False).count()
counts = counts.rename(columns={'mom.ret': 'q'})
# Use this line if you want the float value of the division 0.5
# counts['q'] = 1/counts['q']
# Use this line if you want the string '1/2'
counts['q'] = counts['q'].apply(lambda x: f'1/{x}')
result = pd.merge(df, counts)

The result looks like this:

month   outcome mom.ret q
0   10/20   winner  0.2 1/2
1   10/20   winner  0.9 1/2
2   11/20   winner  0.6 1/3
3   11/20   winner  0.2 1/3
4   11/20   winner  0.9 1/3
5   10/20   loser   0.6 1/2
6   10/20   loser   0.2 1/2
7   11/20   loser   0.9 1/2
8   11/20   loser   0.6 1/2

CodePudding user response：

Use df['q'] = 1/df.groupby(['month', 'outcome']).transform('count').

CodePudding user response：

Both of of the two solutions offered so far were close, but neither worked for me out of the box.

@timgeb's required a groupby of just the month for me. To output a fraction instead of a decimal, I used the handy humanize library.

import humanize      # pip install humanize # if needed

df['q'] = 1 / df.groupby(['month']).transform('count')
df['q'] = df['q'].apply(lambda x : humanize.fractional(x))

Note that you can't just use .count() with the groupby - you need the transform method to return a Series of the same length as the original DataFrame.

@sunnytown's solution required a modification to the apply and the merge for me.

Here's my version:

counts = df.groupby('month', as_index=False).count()
counts['q'] = "1"
counts['q'] = counts['outcome'].apply(lambda x: f'1/{x}')
result = pd.merge(df, counts[['month', 'q']], on='month'

Using Python 3.9.7, pandas 1.4.1

Code to make the original df (I left out the mom.ret column that wasn't germane).

import pandas as pd

df = pd.DataFrame(
    {
        "month": [
            "10/20",
            "10/20",
            "11/20",
            "11/20",
            "11/20",
            "10/20",
            "10/20",
            "10/20",
            "11/20",
        ],
        "outcome": [
            "winner",
            "winner",
            "winner",
            "winner",
            "winner",
            "loser",
            "loser",
            "loser",
            "loser",
        ],
    }
)