I have two different dataframes shown below.
This is the tel_times
dataframe
And this is the maint_comp1
dataframe.
Now, I have joined these two dataframes using merge
.
maint_tel_comp1 = pd.merge(tel_times, maint_comp1, how='inner', left_on=['machineID','datetime_tel'], right_on = ['machineID','datetime_maint'])
The result is
I want to apply conditions on the two datetime columns I have.
Something like this,
maint_tel_comp1 = (telemetry_times.join(maint_comp1,
((telemetry_times ['machineID']== maint_comp1['machineID'])
& (telemetry_times ['datetime_tel'] > maint_comp1['datetime_maint'])
& ( maint_comp1['comp1sum'] == '1')))
This is in PySpark but I want to do it in Pandas.
I am doing this to follow the same condition.
maint_tel_comp1[maint_tel_comp1['datetime_tel'] > maint_tel_comp1['datetime_maint']]
But it is giving an empty dataframe.
CodePudding user response:
I believe what you want is:
maint_tel_comp1 = tel_times.merge(maint_comp1, on='machineID', how='inner')
maint_tel_comp1[maint_tel_comp1['datetime_tel'].gt(maint_tel_comp1['datetime_maint'])]
The problem is your merge was also ensuring datetime_tel == datetime_maint
, hence your condition was returning an empty DataFrame.