Home > Enterprise >  Filling a column with the amount of duplicated values in another column
Filling a column with the amount of duplicated values in another column

Time:02-22

I have a df like this:

month outcome mom.ret
10/20 winner 0.2
10/20 winner 0.9
11/20 winner 0.6
11/20 winner 0.2
11/20 winner 0.9
10/20 loser 0.6
10/20 loser 0.2
10/20 loser 0.9
11/20 loser 0.6

I would like to add another column, which has 1 / by the counts of times the value "winner" or "loser" appears per each month on the column outcome. The expected output for the example df is:

month outcome mom.ret q
10/20 winner 0.2 1/2
10/20 winner 0.9 1/2
11/20 winner 0.6 1/3
11/20 winner 0.2 1/3
11/20 winner 0.9 1/3
10/20 loser 0.6 1/3
10/20 loser 0.2 1/3
10/20 loser 0.9 1/3
11/20 loser 0.6 1/1

I thought of using the function count to count how many times the values are repeated, but then I need to specify that the "count" should be done per each date. Any ideas?

CodePudding user response:

You can use this code to achieve what you want, assuming your original DataFrame is called df:

counts = df.groupby(['month', 'outcome'], as_index=False).count()
counts = counts.rename(columns={'mom.ret': 'q'})
# Use this line if you want the float value of the division 0.5
# counts['q'] = 1/counts['q']
# Use this line if you want the string '1/2'
counts['q'] = counts['q'].apply(lambda x: f'1/{x}')
result = pd.merge(df, counts)

The result looks like this:

month   outcome mom.ret q
0   10/20   winner  0.2 1/2
1   10/20   winner  0.9 1/2
2   11/20   winner  0.6 1/3
3   11/20   winner  0.2 1/3
4   11/20   winner  0.9 1/3
5   10/20   loser   0.6 1/2
6   10/20   loser   0.2 1/2
7   11/20   loser   0.9 1/2
8   11/20   loser   0.6 1/2

CodePudding user response:

Use df['q'] = 1/df.groupby(['month', 'outcome']).transform('count').

CodePudding user response:

Both of of the two solutions offered so far were close, but neither worked for me out of the box.

@timgeb's required a groupby of just the month for me. To output a fraction instead of a decimal, I used the handy humanize library.

import humanize      # pip install humanize # if needed

df['q'] = 1 / df.groupby(['month']).transform('count')
df['q'] = df['q'].apply(lambda x : humanize.fractional(x))

Note that you can't just use .count() with the groupby - you need the transform method to return a Series of the same length as the original DataFrame.

@sunnytown's solution required a modification to the apply and the merge for me.

Here's my version:

counts = df.groupby('month', as_index=False).count()
counts['q'] = "1"
counts['q'] = counts['outcome'].apply(lambda x: f'1/{x}')
result = pd.merge(df, counts[['month', 'q']], on='month'

Using Python 3.9.7, pandas 1.4.1

Code to make the original df (I left out the mom.ret column that wasn't germane).

import pandas as pd

df = pd.DataFrame(
    {
        "month": [
            "10/20",
            "10/20",
            "11/20",
            "11/20",
            "11/20",
            "10/20",
            "10/20",
            "10/20",
            "11/20",
        ],
        "outcome": [
            "winner",
            "winner",
            "winner",
            "winner",
            "winner",
            "loser",
            "loser",
            "loser",
            "loser",
        ],
    }
)
  • Related