I am currently working with large amount of data in an csv. This data is being imported to my program using pandas. What I am currently trying to do is to check whether a change in an series is happening or not. My data frame is structured as followed:
dataframe
Time Data
209012424 123
209012424 352
209012424 1267
209012424 753
209012424 333
209012424 542
209012424 940
209012425 421
209012425 12
209012425 344
209012425 1
209012425 346
209012425 1435
209012428 555
209012428 322
As you can see a series with values are presented. What I am trying to do is to create a new column that checks whether a change is happening in the time column. The desired output will then be:
Time Data Change
209012424 123 False
209012424 352 False
209012424 1267 False
209012424 753 False
209012424 333 False
209012424 542 False
209012424 940 False
209012425 421 True
209012425 12 False
209012425 344 False
209012425 1 False
209012425 346 False
209012425 1435 False
209012428 555 True
209012428 322 False
My solution to this was by using this code:
cond1 = wholedf['TIME'].diff().shift(-1).eq(1).where(wholedf['TIME']==0)
wholedf['Change'] = wholedf['TIME'].diff().eq(1).where(wholedf['TIME'] == 1).fillna(cond1).astype('bool')
However, I get the output:
Time Data Change
209012424 123 True
209012424 352 True
209012424 1267 True
209012424 753 True
209012424 333 True
209012424 542 True
209012424 940 True
209012425 421 True
209012425 12 True
209012425 344 True
209012425 1 True
209012425 346 True
209012425 1435 True
209012428 555 True
209012428 322 True
I dont know why I get this but I hope someone else has a solution!
Hopefully this was enough information! Thank you in advance.
CodePudding user response:
You can use ne()
and shift()
to compare each row with previous row in Time column. Finally replace the first value with False
:
df = df.assign(Change=df.Time.ne(df.Time.shift()))
df.loc[0,'Change'] = False
Time Data Change
0 209012424 123 False
1 209012424 352 False
2 209012424 1267 False
3 209012424 753 False
4 209012424 333 False
5 209012424 542 False
6 209012424 940 False
7 209012425 421 True
8 209012425 12 False
9 209012425 344 False
10 209012425 1 False
11 209012425 346 False
12 209012425 1435 False
13 209012428 555 True
14 209012428 322 False
CodePudding user response:
If need True
for all changed values without first row use Series.diff
and compare for greater like 0
:
#added absolute if possible groups are not sorted
wholedf['Change'] = wholedf['TIME'].diff().abs().gt(0)
#first solution
#wholedf['Change'] = wholedf['TIME'].diff().fillna(0).ne(0)
print (wholedf)
TIME Data Change
0 209012424 123 False
1 209012424 352 False
2 209012424 1267 False
3 209012424 753 False
4 209012424 333 False
5 209012424 542 False
6 209012424 940 False
7 209012425 421 True
8 209012425 12 False
9 209012425 344 False
10 209012425 1 False
11 209012425 346 False
12 209012425 1435 False
13 209012428 555 True
14 209012428 322 False
CodePudding user response:
np.sign will work in this case, fillna for nan inputs and convert the output to boolean
df['Change'] =np.sign(df['Time'].diff().fillna(0)).astype(bool)
Time Data Change
0 209012424 123 False
1 209012424 352 False
2 209012424 1267 False
3 209012424 753 False
4 209012424 333 False
5 209012424 542 False
6 209012424 940 False
7 209012425 421 True
8 209012425 12 False
9 209012425 344 False
10 209012425 1 False
11 209012425 346 False
12 209012425 1435 False
13 209012428 555 True
14 209012428 322 False
CodePudding user response:
Use Series.shift
with df.fillna
, Series.eq
with not(~)
operator:
In [370]: wholedf['Change'] = ~(wholedf.Time - wholedf.Time.shift()).fillna(0).eq(0)
In [376]: wholedf
Out[376]:
Time Data Change
0 209012424 123 False
1 209012424 352 False
2 209012424 1267 False
3 209012424 753 False
4 209012424 333 False
5 209012424 542 False
6 209012424 940 False
7 209012425 421 True
8 209012425 12 False
9 209012425 344 False
10 209012425 1 False
11 209012425 346 False
12 209012425 1435 False
13 209012428 555 True
14 209012428 322 False