Home > Software engineering >  Replace a particular column value with 1 and the rest with 0
Replace a particular column value with 1 and the rest with 0

Time:08-17

I have a DataFrame which has a column containing these values with % occurrence

enter image description here

I want to convert the value with highest occurrence as 1 and the rest as 0. How can I do the same using Pandas?

CodePudding user response:

Try this:


import pandas as pd
import numpy as np

df = pd.DataFrame({'availability': np.random.randint(0, 100, 10), 'some_col': np.random.randn(10)})

print(df)
"""
   availability  some_col
0             9 -0.332662
1            35  0.193257
2             1  2.042402
3            50 -0.298372
4            52 -0.669655
5             3 -1.031884
6            44 -0.763867
7            28  1.093086
8            67  0.723319
9            87 -1.439568
"""

df['availability'] = np.where(df['availability'] == df['availability'].max(), 1, 0)
print(df)
"""
   availability  some_col
0             0 -0.332662
1             0  0.193257
2             0  2.042402
3             0 -0.298372
4             0 -0.669655
5             0 -1.031884
6             0 -0.763867
7             0  1.093086
8             0  0.723319
9             1 -1.439568
"""

Edit

If you are trying to mask the rows with the values that occur most often instead, try this:


df = pd.DataFrame(
    {
        'availability': [10, 10, 20, 30, 40, 40, 50, 50, 50, 50],
        'some_col': np.random.randn(10)
    }
)
print(df)
"""
   availability  some_col
0            10  0.954199
1            10  0.779256
2            20 -0.438860
3            30 -2.547989
4            40  0.587108
5            40  0.398858
6            50  0.776177   # <--- Most Frequent is 50
7            50 -0.391724   # <--- Most Frequent is 50
8            50 -0.886805   # <--- Most Frequent is 50
9            50  1.989000   # <--- Most Frequent is 50
"""

df['availability'] = np.where(df['availability'].isin(df['availability'].mode()), 1, 0)

print(df)
"""
   availability  some_col
0             0  0.954199
1             0  0.779256
2             0 -0.438860
3             0 -2.547989
4             0  0.587108
5             0  0.398858
6             1  0.776177
7             1 -0.391724
8             1 -0.886805
9             1  1.989000
"""

CodePudding user response:

Try:

df.availability.apply(lambda x: 1 if x == df.availability.value_counts().idxmax() else 0)

CodePudding user response:

You can use Series.mode() to get the most often value and use isin to check if value in column in list

df['col'] = df['availability'].isin(df['availability'].mode()).astype(int)

CodePudding user response:

You can compare to the mode with isin, then convert the boolean to integer (True -> 1, False -> 0):

df['col2'] = df['col'].isin(df['col'].mode()).astype(int)

example (here, 2 and 4 are tied as most frequent value), as new column "col2" for clarity:

   col  col2
0    0     0
1    2     1
2    2     1
3    2     1
4    4     1
5    4     1
6    4     1
7    1     0
  • Related