CASE
WHEN a.sch_end_locn_id != a.ats_sta_id
AND a.PLC_ACTUAL_DEPART_TIME IS NULL
AND a.BEACON_ACTUAL_DEPART_TIME IS NULL
THEN
CASE
WHEN a.ITRAC_ACTUAL_DEPART_TIME IS NOT NULL
THEN a.ITRAC_ACTUAL_DEPART_TIME
WHEN a.PLC_ACTUAL_DEPART_TIME_CLEAR IS NOT NULL
THEN a.PLC_ACTUAL_DEPART_TIME_CLEAR - 15/(24*60*60)
WHEN a.PLC_ACTUAL_ARRIVE_TIME_DWELL IS NOT NULL
THEN a.PLC_ACTUAL_ARRIVE_TIME_DWELL a.median_dwell
WHEN a.PLC_ACTUAL_ARRIVE_TIME IS NOT NULL
THEN a.PLC_ACTUAL_ARRIVE_TIME a.median_track_occ
WHEN a.ITRAC_ACTUAL_ARRIVE_TIME IS NOT NULL
THEN a.ITRAC_ACTUAL_ARRIVE_TIME 30/(24*60*60)
ELSE NULL
END
ELSE COALESCE(a.BEACON_ACTUAL_DEPART_TIME, a.PLC_ACTUAL_DEPART_TIME, a.ITRAC_ACTUAL_DEPART_TIME)
I want to convert this multiple case statement to python syntax using np.where. Lets just assume the dataframe name is df. I'm just confused specifically on the operators to use within the second set of case statements. This is how I got started but I'm stuck on adding the other cases.
np.where((df['SCH_END_LOCN_ID'] != df['ATS_STA_ID']) & ((df['PLC_ACTUAL_DEPART_TIME'] == np.datetime64('NaT')) & (df['BEACON_ACTUAL_DEPART_TIME'] == np.datetime64('NaT')) & df['ITRAC_ACTUAL_DEPART_TIME'] != np.datetime64('NaT')),
df['ITRAC_ACTUAL_DEPART_TIME'],
df[["BEACON_ACTUAL_ARRIVE_TIME", "PLC_ACTUAL_ARRIVE_TIME", "ITRAC_ACTUAL_ARRIVE_TIME",]].bfill(axis=1).iloc[:, 0])
CodePudding user response:
Here's a way to do what your question asks:
res = pd.Series(np.where(
(df.SCH_END_LOCN_ID != df.ATS_STA_ID) &
df.PLC_ACTUAL_DEPART_TIME.isna() &
df.BEACON_ACTUAL_DEPART_TIME.isna(),
np.where(
df.ITRAC_ACTUAL_DEPART_TIME.notna(),
df.ITRAC_ACTUAL_DEPART_TIME,
np.where(
df.PLC_ACTUAL_DEPART_TIME_CLEAR.notna(),
df.PLC_ACTUAL_DEPART_TIME_CLEAR - 15/(24*60*60),
np.where(
df.PLC_ACTUAL_ARRIVE_TIME_DWELL.notna(),
df.PLC_ACTUAL_ARRIVE_TIME_DWELL df.MEDIAN_DWELL,
np.where(
df.PLC_ACTUAL_ARRIVE_TIME.notna(),
df.PLC_ACTUAL_ARRIVE_TIME df.MEDIAN_TRACK_OCC,
np.where(
df.ITRAC_ACTUAL_ARRIVE_TIME.notna(),
df.ITRAC_ACTUAL_ARRIVE_TIME 30/(24*60*60),
None
)
)
)
)
),
np.where(
df.BEACON_ACTUAL_DEPART_TIME.notna(),
df.BEACON_ACTUAL_DEPART_TIME,
np.where(
df.PLC_ACTUAL_DEPART_TIME.notna(),
df.PLC_ACTUAL_DEPART_TIME,
df.ITRAC_ACTUAL_DEPART_TIME
)
)
), index=df.index, dtype=float)
Sample input:
SCH_END_LOCN_ID ATS_STA_ID PLC_ACTUAL_DEPART_TIME BEACON_ACTUAL_DEPART_TIME ITRAC_ACTUAL_DEPART_TIME PLC_ACTUAL_DEPART_TIME_CLEAR PLC_ACTUAL_ARRIVE_TIME_DWELL PLC_ACTUAL_ARRIVE_TIME ITRAC_ACTUAL_ARRIVE_TIME MEDIAN_DWELL MEDIAN_TRACK_OCC
0 1 99 NaN NaN 1.0 NaN NaN NaN NaN 0.25 0.5
1 2 99 NaN NaN NaN 2.0 NaN NaN NaN 0.25 0.5
2 3 99 NaN NaN NaN NaN 3.0 NaN NaN 0.25 0.5
3 4 99 NaN NaN NaN NaN NaN 4.0 NaN 0.25 0.5
4 5 99 NaN NaN NaN NaN NaN 5.0 NaN 0.25 0.5
5 6 99 NaN NaN NaN NaN NaN NaN 6.0 0.25 0.5
6 7 7 NaN NaN 7.0 NaN NaN NaN 7.0 0.25 0.5
7 8 8 88.0 NaN NaN NaN NaN NaN 8.0 0.25 0.5
8 9 9 88.0 NaN NaN NaN NaN NaN NaN 0.25 0.5
9 10 10 88.0 99.0 NaN NaN NaN NaN NaN 0.25 0.5
Output:
0 1.000000
1 1.999826
2 3.250000
3 4.500000
4 5.500000
5 6.000347
6 7.000000
7 88.000000
8 88.000000
9 99.000000
dtype: float64