The dataframe
I am dealing with looks like the table below:
COLUMN-A COLUMN-B COLUMN-C COLUMN-D
2005-12-23 2.78229429977895 2.59054751268432
2005-12-28 2.77990953370726 2.59625529291923
2005-12-29 2.77770141742004 2.60175855794512
2005-12-30 2.77565686568447 2.60706465870293
2006-01-03 2.78676377607689 2.61845788272621
2006-01-04 2.79415905904631 2.62804815466004
2006-01-05 2.79233986786484 2.63311058575101
2006-01-06 2.79065543181717 2.63799172343874
2006-01-09 2.7876513234596 2.64200075091549
2006-01-10 2.78342529650764 2.64516894228885
2006-01-11 2.77951230901599 2.64822370776439
2006-01-12 2.77877806345801 2.65256358425937
2006-01-13 2.78965376857357 2.66232574953289
2006-01-16 2.81417572440332 2.67871384606613
2006-01-17 2.83688123723998 2.69451541833616
2006-01-18 2.84923078073203 2.70556518000894
2006-01-19 2.854887762274 2.71343113557577
2006-01-20 2.86012570781281 2.72101563266667
2006-01-23 2.8620867671879 2.72693465617535
2006-01-24 2.85668033821582 2.72915676427006
2006-01-25 2.85311883059988 2.7319963852241
2006-01-27 2.84982113851717 2.73473442527192
2006-01-30 2.84098994077245 2.73458665290639
2006-01-31 2.83281290615161 2.73444416615124
2006-02-01 2.82235268854652 2.73291291585375
2006-02-02 2.79821544736977 2.72446373657389 2.31735945722146
2006-02-03 2.7903180053127 2.72328924609567 2.32165937425023
2006-02-06 2.78300555917914 2.72215675685381 2.32590335299919
2006-02-07 2.77912366526979 2.72245848891773 2.33053900014161
2006-02-08 2.77552931914827 2.72274943166327 2.33511466419111
I'm trying to write logic to return True where COLUMN-D
had it's first numeric entry & False in all other cases
Here's the logic I've written which is throwing the error - ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
Code
import pandas as pd
def has_trail_started(df, df_key):
return (~pd.isnull(df[df_key])) & (pd.isnull(df[df_key].shift()))
if (has_trail_started(data, 'COLUMN-D') and data['has_changed_status']):
// Logic
Please could I get some help to rectify the problem?
CodePudding user response:
Your function returns series, which can't be interpreted as bool for purposes of if
statement. But you can add the "trail start info" to the df as follows:
def has_trail_started(df, df_key):
df["has_trail_started"] = (~pd.isnull(df[df_key])) & (pd.isnull(df[df_key].shift()))
has_trail_started(data, 'COLUMN-D')
Then df looks like this:
COLUMN-A COLUMN-B COLUMN-C COLUMN-D has_trail_started
0 2005-12-23 2.782294 2.590548 NaN False
1 2005-12-28 2.779910 2.596255 NaN False
2 2005-12-29 2.777701 2.601759 NaN False
3 2005-12-30 2.775657 2.607065 NaN False
4 2006-01-03 2.786764 2.618458 NaN False
5 2006-01-04 2.794159 2.628048 NaN False
6 2006-01-05 2.792340 2.633111 NaN False
7 2006-01-06 2.790655 2.637992 NaN False
8 2006-01-09 2.787651 2.642001 NaN False
9 2006-01-10 2.783425 2.645169 NaN False
10 2006-01-11 2.779512 2.648224 NaN False
11 2006-01-12 2.778778 2.652564 NaN False
12 2006-01-13 2.789654 2.662326 NaN False
13 2006-01-16 2.814176 2.678714 NaN False
14 2006-01-17 2.836881 2.694515 NaN False
15 2006-01-18 2.849231 2.705565 NaN False
16 2006-01-19 2.854888 2.713431 NaN False
17 2006-01-20 2.860126 2.721016 NaN False
18 2006-01-23 2.862087 2.726935 NaN False
19 2006-01-24 2.856680 2.729157 NaN False
20 2006-01-25 2.853119 2.731996 NaN False
21 2006-01-27 2.849821 2.734734 NaN False
22 2006-01-30 2.840990 2.734587 NaN False
23 2006-01-31 2.832813 2.734444 NaN False
24 2006-02-01 2.822353 2.732913 NaN False
25 2006-02-02 2.798215 2.724464 2.317359 True
26 2006-02-03 2.790318 2.723289 2.321659 False
27 2006-02-06 2.783006 2.722157 2.325903 False
28 2006-02-07 2.779124 2.722458 2.330539 False
29 2006-02-08 2.775529 2.722749 2.335115 False
Now you can apply some logic based on this new bool like this:
data["extra_logic"] = data["has_trail_started"].apply(lambda x: "yay" if x else "boo")
Which will add a new column with values being a function of has_trail_started
flag.