Home > OS >  How to check a column for str value, determine if another column is less/greater than [x] return boo
How to check a column for str value, determine if another column is less/greater than [x] return boo

Time:09-17

I have a dataframe that looks like this

product duration
tire change 01:16:51
oil change 05:06:00
tire change 02:03:04
oil change 06:23:14
oil change 03:40:27

I want to create a new column that returns a boolean based on the 2 columns

product duration duration_bool
tire change 01:16:51 True
oil change 01:06:00 True
tire change 04:03:04 False
oil change 02:23:14 False
oil change 03:40:27 False

Is this the correct way to actually use a function on a dataframe? I am having trouble understanding how if this is actually accomplishing what im after.

def sla_bool_checker(my_var):

    #check if product is a tire change, if it is, check if duration is under 4 hours and return the Boolean in the new column

    if df['product'] == 'tire change' :
        df['duration_bool'] = df['duration'] < pd.Timedelta(4, unit='h')

    #check if product is a oil change, if it is, check if duration is under 2 hours and return the Boolean

    elif df['product'] == 'oil change' :
        df['duration_bool'] < pd.Timedelta(2, unit='h')

I dont know what im missing, but this is the code error.

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

CodePudding user response:

Create a boolean array based on your conditions and assign that to the new column.

df['duration'] = df['duration'].apply(pd.Timedelta) # make sure duration has a dtype of Timedelta

df['duration_bool'] = ((df['product'] == 'tire change') & (df['duration'] < pd.Timedelta(4, unit='h'))) | \
((df['product'] == 'oil change') & (df['duration'] < pd.Timedelta(2, unit='h')))

       product        duration  duration_bool
0  tire change 0 days 01:16:51           True
1   oil change 0 days 05:06:00          False
2  tire change 0 days 02:03:04           True
3   oil change 0 days 06:23:14          False
4   oil change 0 days 03:40:27          False

what it means

((df['product'] == 'tire change') & (df['duration'] < pd.Timedelta(4, unit='h'))) where the product is equal to tire change AND the duration is less than 4 hours.

| or

((df['product'] == 'oil change') & (df['duration'] < pd.Timedelta(2, unit='h'))) where product is equal to oil change AND the duration is less than 2 hours

CodePudding user response:

First of all, the durations in your two examples don't match, this makes it hard to compare the input vs output results. Please check this next time. Then you can use:

df.loc[df["product"] == "tire change", "duration_bool"] = pd.to_timedelta(df["duration"]) < pd.Timedelta(4, unit="h")
df.loc[df["product"] == "oil change", "duration_bool"] = pd.to_timedelta(df["duration"]) < pd.Timedelta(2, unit="h")

This directly sets the values of row duration_bool to be the outcome of the pd.Timedelta(...) function, but pd.to_timedelta(...) ensures it is a timedelta to be compared to. This gets you:

|    | product     | duration   | duration_bool   |
|---:|:------------|:-----------|:----------------|
|  0 | tire change | 01:16:51   | True            |
|  1 | oil change  | 01:06:00   | True            |
|  2 | tire change | 04:03:04   | False           |
|  3 | oil change  | 02:23:14   | False           |
|  4 | oil change  | 03:40:27   | False           |

CodePudding user response:

What I found out is that I needed to make a return clause in my def sla_bool_checker. The return value then needed to applied to my dataframe using apply. I still cant grasp exactly how apply works, but it worked, i wish i had a deeper explanation for anyone needing one.

I probably should have used np.where() (still hazy on how to make that work) but the answer by @it_is_chris actually worked well for me too! (thanks chris)

from there I just kept researching since I really wanted to figure out a way to use a function for it. Probably not ideal, but I learned a ton.

here is the code I used.

def sla_bool_checker(my_var):
    #check if product is a tire change, if it is, check if duration is under 4 hours and return the Boolean in new column
    if my_var['product'] == 'tire change' :
        return my_var['duration'] < pd.Timedelta(4, unit='h')
    #check if product is an oil change, if it is, check if duration is under 24 hours and return the Boolean
    elif my_var['product'] == 'oil change' :
        return my_var['duration'] < pd.Timedelta(2, unit='h')

then I used

df['duration_bool'] = df.apply(sla_bool_checker, axis=1)     
df

resulting in

product duration duration_bool
0 tire change 01:16:51 True
1 oil change 01:06:00 True
2 tire change 04:03:04 False
3 oil change 02:23:14 False
4 oil change 03:40:27 False
  • Related