I have written logic for changing data inside a pivoted table and I am able to achieve for a single condition but I need to place an else condition too. tried .apply() but it doesn't satisfy the dataset I am looking for.
df_s = sorted_data.groupby(["GH","HA","Tipo"]).first().reset_index()
df_s22 = df_s[df_s['Tipo'].eq('1')].assign(Tipo='2').rename(lambda x: x .5)
I need an else condition above to assign 1 if not.
CodePudding user response:
The column "Tipo" is a string the way you handle it currently. Perhaps you could convert it into integers, usually easier to work with. Anyway, you have something with a column Tipo
, that are strings, and they are either '1' or '2' (hard to tell if there are other values allowed, which affects the approaches you can take).
import numpy as np
import pandas as pd
df_s = pd.DataFrame({'Tipo':[str(i) for i in np.random.randint(1,3, size=10)],
'other_data':[chr(i) for i in np.random.randint(65,90, size=10)]})
Method 1
The most direct solution to your problem would be to define a function and apply it row wise, i.e. axis=1
(probably inefficient but does the job):
def fn(row):
row.loc['Tipo'] = '2' if row.Tipo=='1' else '1'
return row
df_s22 = df_s.apply(fn, axis=1)
timings: 2.57 ms ± 153 µs per loop
Method 2
Or apply directly to the column of interest
df_s22 = df_s.copy()
df_s22.loc[:,'Tipo'] = df_s22.loc[:,'Tipo'].apply(lambda x: '2' if x=='1' else '1')
timings: 862 µs ± 30.7 µs per loop
Method 3
You can also use the eval
method:
df_s22 = df_s.copy()
df_s22.loc[:,'Tipo'] = df_s22.eval("Tipo=='1'").astype(int) 1
timings: 2.45 ms ± 97.3 µs per loop
Here I use the eval
method and checks if the Tipo column is '2'. Thus if it is, it will be True, which in Python can also be interpreted as '1', so adding 1 to the check Tipo=='2'
will make the True value be 1 1=2, and the other values which evaluate to False (i.e. = 0), we add 1 to as well, will be 1. The output will have Tipo column as integers, not strings any more.
Method 4
Using the assign
method we can use similar checks
df_s22 = df_s.assign(Tipo = str((df_s.Tipo=='1') 1))
timings: 783 µs ± 18.3 µs per loop
Tips and remarks:
In all cases you need to keep track of quotation marks, since
Tipo
is a string, for expressions within "- quotes the inner quotes need to be single '-quotes.Also remember that you are creating a second DataFrame in memory (
df_s22
), if your dataset is large and you want to do complex operations on it they might be slower if memory becomes full. Think about just creating a new column, perhaps namedTipo22
in your original DataFramedf_s
.