Home > Enterprise >  pandas apply function with if statement inside the custom function
pandas apply function with if statement inside the custom function

Time:06-11

def cal_properties(pressure):
    
    if pressure>=0 and pressure<=1000:
        density=1/pressure  #myfunction(pressure)
    else:
        density=pressure*10

    return  density

print(df)


WELL_NKNME  10A74  10A75  10A77  10A78  11A74  11A75  11A77  11A78
Date                                                              
2022-06-05    0.0    0.0    0.0    0.0    0.0  122.8   56.3   96.3
2022-06-06    0.0    0.0    0.0    0.0    0.0  118.3   52.0   85.3
2022-06-07    0.0    0.0    0.0    0.0    0.0  119.5   52.9   87.4

df=df.apply(lambda row: cal_properties(row),axis=1)

then I got an error related to if statement


----> 7 df=df.apply(lambda row: cal_properties(row),axis=1)
      8 df

C:\Anaconda\envs\dash_tf\lib\site-packages\pandas\core\frame.py in apply(
    self,
    func,
    axis,
    raw,
    result_type,
    args,
    **kwargs
)
   8738             kwargs=kwargs,
   8739         )
-> 8740         return op.apply()
   8741 
   8742     def applymap(

C:\Anaconda\envs\dash_tf\lib\site-packages\pandas\core\apply.py in apply(self)
    686             return self.apply_raw()
    687 
--> 688         return self.apply_standard()
    689 
    690     def agg(self):

C:\Anaconda\envs\dash_tf\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    810 
    811     def apply_standard(self):
--> 812         results, res_index = self.apply_series_generator()
    813 
    814         # wrap results

C:\Anaconda\envs\dash_tf\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
    826             for i, v in enumerate(series_gen):
    827                 # ignore SettingWithCopy here in case the user mutates
--> 828                 results[i] = self.f(v)
    829                 if isinstance(results[i], ABCSeries):
    830                     # If we have a view on v, we need to make a copy because

C:\Temp\1\ipykernel_840\3896317687.py in <lambda>(row)
      5 # print(df.iloc[0:1,:])
      6 # print(df.to_dict())
----> 7 df=df.apply(lambda row: cal_properties(row),axis=1)
      8 df

C:\Temp\1\ipykernel_840\2054338456.py in cal_properties(pressure)
      1 def cal_properties(pressure):
      2 
----> 3     if pressure>=0 and pressure<=1000:
      4         density=1/pressure  #myfunction(pressure)
      5     else:

C:\Anaconda\envs\dash_tf\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1535     @final
   1536     def __nonzero__(self):
-> 1537         raise ValueError(
   1538             f"The truth value of a {type(self).__name__} is ambiguous. "
   1539             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

here is the dataframe dictionary data so you can exercise the code. if I don't have if statement, the code is fine. I am not sure how to solve it? Thanks for your help.

print(df.to_dict())

{'10A74': {Timestamp('2022-06-05 00:00:00'): 0.0, Timestamp('2022-06-06 00:00:00'): 0.0, Timestamp('2022-06-07 00:00:00'): 0.0}, '10A75': {Timestamp('2022-06-05 00:00:00'): 0.0, Timestamp('2022-06-06 00:00:00'): 0.0, Timestamp('2022-06-07 00:00:00'): 0.0}, '10A77': {Timestamp('2022-06-05 00:00:00'): 0.0, Timestamp('2022-06-06 00:00:00'): 0.0, Timestamp('2022-06-07 00:00:00'): 0.0}, '10A78': {Timestamp('2022-06-05 00:00:00'): 0.0, Timestamp('2022-06-06 00:00:00'): 0.0, Timestamp('2022-06-07 00:00:00'): 0.0}, '11A74': {Timestamp('2022-06-05 00:00:00'): 0.0, Timestamp('2022-06-06 00:00:00'): 0.0, Timestamp('2022-06-07 00:00:00'): 0.0}, '11A75': {Timestamp('2022-06-05 00:00:00'): 122.8, Timestamp('2022-06-06 00:00:00'): 118.3, Timestamp('2022-06-07 00:00:00'): 119.5}, '11A77': {Timestamp('2022-06-05 00:00:00'): 56.3, Timestamp('2022-06-06 00:00:00'): 52.0, Timestamp('2022-06-07 00:00:00'): 52.9}, '11A78': {Timestamp('2022-06-05 00:00:00'): 96.3, Timestamp('2022-06-06 00:00:00'): 85.3, Timestamp('2022-06-07 00:00:00'): 87.4}}

CodePudding user response:

It seems like a job for np.where instead:

df.loc[:, :] = np.where((df >= 0) & (df <= 1000), 1/df, df*10)

Same logic can be applied row-wise:

def cal_properties(pressure_row):
    return pd.Series(
        np.where(pressure_row.between(0, 1000), 1/pressure_row, pressure_row*10),
        index=pressure_row.index
    )

df = df.apply(cal_properties,axis=1)

CodePudding user response:

WE know you cannot divide by 0

changing your function to stricty >0 then you can do applymap since you are doing the calculations cellwise instead of rowwise/columnwise. hence:

def cal_properties(pressure):
    
    if pressure>0 and pressure<=1000:
        density=1/pressure  #myfunction(pressure)
    else:
        density=pressure*10

    return  density

df.applymap(cal_properties)
 
            10A74  10A75  10A77  10A78  11A74     11A75     11A77     11A78
2022-06-05    0.0    0.0    0.0    0.0    0.0  0.008143  0.017762  0.010384
2022-06-06    0.0    0.0    0.0    0.0    0.0  0.008453  0.019231  0.011723
2022-06-07    0.0    0.0    0.0    0.0    0.0  0.008368  0.018904  0.011442
  • Related