Home > Software engineering >  Boolean values always return as true? (Pandas)
Boolean values always return as true? (Pandas)

Time:11-12

I am currently working with large amount of data in an csv. This data is being imported to my program using pandas. What I am currently trying to do is to check whether a change in an series is happening or not. My data frame is structured as followed:

dataframe

Time      Data
209012424 123
209012424 352
209012424 1267
209012424 753
209012424 333
209012424 542
209012424 940
209012425 421
209012425 12
209012425 344
209012425 1
209012425 346
209012425 1435
209012428 555
209012428 322

As you can see a series with values are presented. What I am trying to do is to create a new column that checks whether a change is happening in the time column. The desired output will then be:

Time      Data      Change
209012424 123       False
209012424 352       False
209012424 1267      False
209012424 753       False
209012424 333       False
209012424 542       False
209012424 940       False
209012425 421       True
209012425 12        False
209012425 344       False
209012425 1         False
209012425 346       False
209012425 1435      False
209012428 555       True
209012428 322       False

My solution to this was by using this code:

cond1 = wholedf['TIME'].diff().shift(-1).eq(1).where(wholedf['TIME']==0)
    
wholedf['Change'] = wholedf['TIME'].diff().eq(1).where(wholedf['TIME'] == 1).fillna(cond1).astype('bool')

However, I get the output:

Time      Data      Change
209012424 123       True
209012424 352       True
209012424 1267      True
209012424 753       True
209012424 333       True
209012424 542       True
209012424 940       True
209012425 421       True
209012425 12        True 
209012425 344       True
209012425 1         True
209012425 346       True
209012425 1435      True
209012428 555       True
209012428 322       True

I dont know why I get this but I hope someone else has a solution!

Hopefully this was enough information! Thank you in advance.

CodePudding user response:

You can use ne() and shift() to compare each row with previous row in Time column. Finally replace the first value with False:

df = df.assign(Change=df.Time.ne(df.Time.shift()))
df.loc[0,'Change'] = False

         Time  Data  Change
0   209012424   123   False
1   209012424   352   False
2   209012424  1267   False
3   209012424   753   False
4   209012424   333   False
5   209012424   542   False
6   209012424   940   False
7   209012425   421    True
8   209012425    12   False
9   209012425   344   False
10  209012425     1   False
11  209012425   346   False
12  209012425  1435   False
13  209012428   555    True
14  209012428   322   False

CodePudding user response:

If need True for all changed values without first row use Series.diff and compare for greater like 0:

#added absolute if possible groups are not sorted
wholedf['Change'] = wholedf['TIME'].diff().abs().gt(0)
#first solution
#wholedf['Change'] = wholedf['TIME'].diff().fillna(0).ne(0)

print (wholedf)
         TIME  Data  Change
0   209012424   123   False
1   209012424   352   False
2   209012424  1267   False
3   209012424   753   False
4   209012424   333   False
5   209012424   542   False
6   209012424   940   False
7   209012425   421    True
8   209012425    12   False
9   209012425   344   False
10  209012425     1   False
11  209012425   346   False
12  209012425  1435   False
13  209012428   555    True
14  209012428   322   False

CodePudding user response:

np.sign will work in this case, fillna for nan inputs and convert the output to boolean

df['Change'] =np.sign(df['Time'].diff().fillna(0)).astype(bool)



       Time  Data  Change
0   209012424   123   False
1   209012424   352   False
2   209012424  1267   False
3   209012424   753   False
4   209012424   333   False
5   209012424   542   False
6   209012424   940   False
7   209012425   421    True
8   209012425    12   False
9   209012425   344   False
10  209012425     1   False
11  209012425   346   False
12  209012425  1435   False
13  209012428   555    True
14  209012428   322   False

CodePudding user response:

Use Series.shift with df.fillna, Series.eq with not(~) operator:

In [370]: wholedf['Change'] = ~(wholedf.Time - wholedf.Time.shift()).fillna(0).eq(0)

In [376]: wholedf
Out[376]: 
         Time  Data  Change
0   209012424   123   False
1   209012424   352   False
2   209012424  1267   False
3   209012424   753   False
4   209012424   333   False
5   209012424   542   False
6   209012424   940   False
7   209012425   421    True
8   209012425    12   False
9   209012425   344   False
10  209012425     1   False
11  209012425   346   False
12  209012425  1435   False
13  209012428   555    True
14  209012428   322   False
  • Related