Home > Software design >  Creating dummy variable based on value counts in a function
Creating dummy variable based on value counts in a function

Time:05-19

I'm trying to assign 0 to string that is more frequent and 1 to less frequent string in a function. My idea is that it should take any column with binary string and based on value count assign 0 and 1. How can i do that?


data = {'status':["Default", "Non-Default", "Non-Default", "Non-Default", "Default", "Non-Default"]}
df = pd.DataFrame(data)
df

        status
    0   Default
    1   Non-Default
    2   Non-Default
    3   Non-Default
    4   Default
    5   Non-Default


    df.value_counts()

    status     
    Non-Default    4
    Default        2
    dtype: int64

CodePudding user response:

You can use:

df['binary'] = df['status'].ne(df['status'].mode().iloc[0]).astype(int)

mode gets the most frequent value, and iloc[0] gets the first one (in case of equality). Then we identify the values that are NOT this string (True) and convert to integer (1). The most frequent string will be 0.

output:

        status  binary
0      Default       1
1  Non-Default       0
2  Non-Default       0
3  Non-Default       0
4      Default       1
5  Non-Default       0
  • Related