Home > database >  Return True only when first numeric entry is found & False in all other cases
Return True only when first numeric entry is found & False in all other cases

Time:09-25

The dataframe I am dealing with looks like the table below:

 COLUMN-A      COLUMN-B           COLUMN-C            COLUMN-D
2005-12-23  2.78229429977895    2.59054751268432    
2005-12-28  2.77990953370726    2.59625529291923    
2005-12-29  2.77770141742004    2.60175855794512    
2005-12-30  2.77565686568447    2.60706465870293    
2006-01-03  2.78676377607689    2.61845788272621    
2006-01-04  2.79415905904631    2.62804815466004    
2006-01-05  2.79233986786484    2.63311058575101    
2006-01-06  2.79065543181717    2.63799172343874    
2006-01-09  2.7876513234596 2.64200075091549    
2006-01-10  2.78342529650764    2.64516894228885    
2006-01-11  2.77951230901599    2.64822370776439    
2006-01-12  2.77877806345801    2.65256358425937    
2006-01-13  2.78965376857357    2.66232574953289    
2006-01-16  2.81417572440332    2.67871384606613    
2006-01-17  2.83688123723998    2.69451541833616    
2006-01-18  2.84923078073203    2.70556518000894    
2006-01-19  2.854887762274  2.71343113557577    
2006-01-20  2.86012570781281    2.72101563266667    
2006-01-23  2.8620867671879 2.72693465617535    
2006-01-24  2.85668033821582    2.72915676427006    
2006-01-25  2.85311883059988    2.7319963852241 
2006-01-27  2.84982113851717    2.73473442527192    
2006-01-30  2.84098994077245    2.73458665290639    
2006-01-31  2.83281290615161    2.73444416615124    
2006-02-01  2.82235268854652    2.73291291585375    
2006-02-02  2.79821544736977    2.72446373657389    2.31735945722146
2006-02-03  2.7903180053127 2.72328924609567    2.32165937425023
2006-02-06  2.78300555917914    2.72215675685381    2.32590335299919
2006-02-07  2.77912366526979    2.72245848891773    2.33053900014161
2006-02-08  2.77552931914827    2.72274943166327    2.33511466419111

I'm trying to write logic to return True where COLUMN-D had it's first numeric entry & False in all other cases

Here's the logic I've written which is throwing the error - ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

Code

import pandas as pd

    def has_trail_started(df, df_key):
        return (~pd.isnull(df[df_key])) & (pd.isnull(df[df_key].shift()))
    
    if (has_trail_started(data, 'COLUMN-D') and data['has_changed_status']):
       // Logic

Please could I get some help to rectify the problem?

CodePudding user response:

Your function returns series, which can't be interpreted as bool for purposes of if statement. But you can add the "trail start info" to the df as follows:

def has_trail_started(df, df_key):
    df["has_trail_started"] = (~pd.isnull(df[df_key])) & (pd.isnull(df[df_key].shift()))

has_trail_started(data, 'COLUMN-D')

Then df looks like this:

      COLUMN-A  COLUMN-B  COLUMN-C  COLUMN-D  has_trail_started
0   2005-12-23  2.782294  2.590548       NaN              False
1   2005-12-28  2.779910  2.596255       NaN              False
2   2005-12-29  2.777701  2.601759       NaN              False
3   2005-12-30  2.775657  2.607065       NaN              False
4   2006-01-03  2.786764  2.618458       NaN              False
5   2006-01-04  2.794159  2.628048       NaN              False
6   2006-01-05  2.792340  2.633111       NaN              False
7   2006-01-06  2.790655  2.637992       NaN              False
8   2006-01-09  2.787651  2.642001       NaN              False
9   2006-01-10  2.783425  2.645169       NaN              False
10  2006-01-11  2.779512  2.648224       NaN              False
11  2006-01-12  2.778778  2.652564       NaN              False
12  2006-01-13  2.789654  2.662326       NaN              False
13  2006-01-16  2.814176  2.678714       NaN              False
14  2006-01-17  2.836881  2.694515       NaN              False
15  2006-01-18  2.849231  2.705565       NaN              False
16  2006-01-19  2.854888  2.713431       NaN              False
17  2006-01-20  2.860126  2.721016       NaN              False
18  2006-01-23  2.862087  2.726935       NaN              False
19  2006-01-24  2.856680  2.729157       NaN              False
20  2006-01-25  2.853119  2.731996       NaN              False
21  2006-01-27  2.849821  2.734734       NaN              False
22  2006-01-30  2.840990  2.734587       NaN              False
23  2006-01-31  2.832813  2.734444       NaN              False
24  2006-02-01  2.822353  2.732913       NaN              False
25  2006-02-02  2.798215  2.724464  2.317359               True
26  2006-02-03  2.790318  2.723289  2.321659              False
27  2006-02-06  2.783006  2.722157  2.325903              False
28  2006-02-07  2.779124  2.722458  2.330539              False
29  2006-02-08  2.775529  2.722749  2.335115              False

Now you can apply some logic based on this new bool like this:

data["extra_logic"] = data["has_trail_started"].apply(lambda x: "yay" if x else "boo")

Which will add a new column with values being a function of has_trail_started flag.

  • Related