I'm trying to assign 0 to string that is more frequent and 1 to less frequent string in a function. My idea is that it should take any column with binary string and based on value count assign 0 and 1. How can i do that?
data = {'status':["Default", "Non-Default", "Non-Default", "Non-Default", "Default", "Non-Default"]}
df = pd.DataFrame(data)
df
status
0 Default
1 Non-Default
2 Non-Default
3 Non-Default
4 Default
5 Non-Default
df.value_counts()
status
Non-Default 4
Default 2
dtype: int64
CodePudding user response:
You can use:
df['binary'] = df['status'].ne(df['status'].mode().iloc[0]).astype(int)
mode
gets the most frequent value, and iloc[0]
gets the first one (in case of equality). Then we identify the values that are NOT this string (True
) and convert to integer (1
). The most frequent string will be 0
.
output:
status binary
0 Default 1
1 Non-Default 0
2 Non-Default 0
3 Non-Default 0
4 Default 1
5 Non-Default 0