How Do I put an else condition inside a dataframe for assigning values-CodePudding

I have written logic for changing data inside a pivoted table and I am able to achieve for a single condition but I need to place an else condition too. tried .apply() but it doesn't satisfy the dataset I am looking for.

df_s = sorted_data.groupby(["GH","HA","Tipo"]).first().reset_index()
df_s22 = df_s[df_s['Tipo'].eq('1')].assign(Tipo='2').rename(lambda x: x   .5)

I need an else condition above to assign 1 if not.

CodePudding user response：

The column "Tipo" is a string the way you handle it currently. Perhaps you could convert it into integers, usually easier to work with. Anyway, you have something with a column Tipo, that are strings, and they are either '1' or '2' (hard to tell if there are other values allowed, which affects the approaches you can take).

import numpy as np
import pandas as pd
df_s = pd.DataFrame({'Tipo':[str(i) for i in np.random.randint(1,3, size=10)], 
                     'other_data':[chr(i) for i in np.random.randint(65,90, size=10)]})

Method 1

The most direct solution to your problem would be to define a function and apply it row wise, i.e. axis=1 (probably inefficient but does the job):

def fn(row):
    row.loc['Tipo'] = '2' if row.Tipo=='1' else '1'
    return row
df_s22 = df_s.apply(fn, axis=1)

timings: 2.57 ms ± 153 µs per loop

Method 2

Or apply directly to the column of interest

df_s22 = df_s.copy()
df_s22.loc[:,'Tipo'] = df_s22.loc[:,'Tipo'].apply(lambda x: '2' if x=='1' else '1')

timings: 862 µs ± 30.7 µs per loop

Method 3

You can also use the eval method:

df_s22 = df_s.copy()
df_s22.loc[:,'Tipo'] = df_s22.eval("Tipo=='1'").astype(int) 1

timings: 2.45 ms ± 97.3 µs per loop

Here I use the eval method and checks if the Tipo column is '2'. Thus if it is, it will be True, which in Python can also be interpreted as '1', so adding 1 to the check Tipo=='2' will make the True value be 1 1=2, and the other values which evaluate to False (i.e. = 0), we add 1 to as well, will be 1. The output will have Tipo column as integers, not strings any more.

Method 4

Using the assign method we can use similar checks

df_s22 = df_s.assign(Tipo = str((df_s.Tipo=='1') 1))

timings: 783 µs ± 18.3 µs per loop

Tips and remarks:

In all cases you need to keep track of quotation marks, since Tipo is a string, for expressions within "- quotes the inner quotes need to be single '-quotes.
Also remember that you are creating a second DataFrame in memory (df_s22), if your dataset is large and you want to do complex operations on it they might be slower if memory becomes full. Think about just creating a new column, perhaps named Tipo22 in your original DataFrame df_s.