Home > Enterprise >  Pandas Case Statement Not Showing Correct Output
Pandas Case Statement Not Showing Correct Output

Time:07-26

I am trying to write a case statement where if the column value = 50 THEN 50 ELSE 7 into a new column. When i do this case statement - there are certainly columns where the value is 50 but is still spitting out 7.

df['wattage'] = np.where(df['charge_type_2'] ==50, '50','7')

CodePudding user response:

I think you need to compare your column to a string not to an integer:

df['wattage'] = np.where(df['charge_type_2'] == '50', '50','7')
#                                        HERE --^--^

And if you have mixed type in your column, you can cast the values as string before:

df['wattage'] = np.where(df['charge_type_2'].astype(str) == '50', '50','7')

CodePudding user response:

It looks like you have mixed types, convert to a common one. I'd encourage you to use numbers, not string representation of numbers:

df = pd.DataFrame({'charge_type_2': [1, '2', '50', 50]})

df['wattage'] = pd.to_numeric(df['charge_type_2'], errors='coerce').where(lambda x: x.eq(50), 7)

Output:

  charge_type_2  wattage
0             1        7
1             2        7
2            50       50
3            50       50

Note that charge_type_2 remained unconverted, you might want to replace it!

CodePudding user response:

Use:

df = pd.DataFrame({'test': [1,50,51,20,0,2**11]})

def check(val):
    if val == 50:
        return '50'
    else:
        return '7'
    
df['test'].apply(lambda x: check(x))

Output:

0     7
1    50
2     7
3     7
4     7
5     7
Name: test, dtype: int64

Maybe your problem is with your data type:

df = pd.DataFrame({'test': [1,50,51,20,0,2**11]})

def check(val):
    if val == 50:
        return 50
    else:
        return 7
    
df['test'].astype(str).apply(lambda x: check(x))

In this case the output is:

0    7
1    7
2    7
3    7
4    7
5    7
Name: test, dtype: int64

More simply, use:

df.where(df==50, 7).astype(str)

again, note to the type of your data.

CodePudding user response:

in this case you can use from np.select , it is very faster than apply method

df['wattage'] = np.select([df['charge_type_2'] ==50], ['50'], '7')
  • Related