I want to get the last in the "15:30:00.0" row for every day in the data frame, but as you can see on the 16th we don't get any rows for anything from 13:00:00.0 to 15:30:00.0
My DataFrame:
Date Time Open High Low Last
0 2023-01-13 09:30:00.0 3968.25 3985 3965.75 3980.25
1 2023-01-13 10:00:00.0 3980 3998.5 3974 3998
2 2023-01-13 10:30:00.0 3998 4000.75 3991.25 3996.75
3 2023-01-13 11:00:00.0 3996.5 3999 3986.25 3992.75
4 2023-01-13 11:30:00.0 3993 3993.5 3985 3990.75
5 2023-01-13 12:00:00.0 3990.75 3998.75 3989.75 3997.5
6 2023-01-13 12:30:00.0 3997.5 4002 3993 3999.75
7 2023-01-13 13:00:00.0 4000 4002.25 3993.75 3997.5
8 2023-01-13 13:30:00.0 3997.25 4010 3996.25 4008.25
9 2023-01-13 14:00:00.0 4008 4010.75 4004.25 4008.75
10 2023-01-13 14:30:00.0 4009 4011.75 4006.25 4009.5
11 2023-01-13 15:00:00.0 4009.75 4016 4009 4016
12 2023-01-13 15:30:00.0 4016 4024.25 4014.75 4017.75
13 2023-01-16 09:30:00.0 4014.75 4019.25 4014.75 4017.5
14 2023-01-16 10:00:00.0 4017.75 4020 4015.5 4017.25
15 2023-01-16 10:30:00.0 4017 4020.5 4017 4018.25
16 2023-01-16 11:00:00.0 4018 4019.5 4015.75 4016.75
17 2023-01-16 11:30:00.0 4016.75 4017 4010.5 4012
18 2023-01-16 12:00:00.0 4012.25 4013 4010.75 4010.75
19 2023-01-16 12:30:00.0 4010.75 4015 4008 4010
20 2023-01-17 09:30:00.0 4018 4024.25 4008.75 4018.25
21 2023-01-17 10:00:00.0 4018.5 4035.25 4018.5 4030.25
22 2023-01-17 10:30:00.0 4030.25 4031.25 4010.5 4014.75
23 2023-01-17 11:00:00.0 4014.75 4017.25 4002.75 4009.5
24 2023-01-17 11:30:00.0 4009.25 4016.25 4008.25 4014.5
25 2023-01-17 12:00:00.0 4014.75 4019 4007.25 4008.25
26 2023-01-17 12:30:00.0 4008.5 4016 4007.75 4013.5
27 2023-01-17 13:00:00.0 4013.75 4016.5 4011.5 4014
28 2023-01-17 13:30:00.0 4014.25 4020.5 4012.75 4019
29 2023-01-17 14:00:00.0 4019.25 4021 4008.25 4010.75
30 2023-01-17 14:30:00.0 4011 4019.5 4010.75 4013.75
31 2023-01-17 15:00:00.0 4013.75 4018.25 4010.25 4012
32 2023-01-17 15:30:00.0 4011.75 4014.25 4003.75 4010
if I use this code below to try to pull the interval:
m = df["Time"].eq("15:30:00.0")
out = df[m].groupby(["Date", "Time"], as_index=False)["Last"].max()
Output:
Date Time Last
0 2023-01-13 15:30:00.0 4017.75
1 2023-01-17 15:30:00.0 4010
Is it possible to put Nan or put something there so it recognizes the day but since no value just put Nan.
My desired output:
Date Time Last
0 2023-01-13 15:30:00.0 4017.75
1 2023-01-16 15:30:00.0 NaN
2 2023-01-17 15:30:00.0 4010
CodePudding user response:
You can use .combine_first()
with a base DF that extracts all the dates in the original DF using .unique()
:
base = pd.DataFrame({"Date": df["Date"].unique(), "Time": "15:30:00.0"})
base.set_index("Date").combine_first(out.set_index("Date")).reset_index()
This outputs:
Date Last Time
0 2023-01-13 4017.75 15:30:00.0
1 2023-01-16 NaN 15:30:00.0
2 2023-01-17 4010.00 15:30:00.0
CodePudding user response:
you can use dropna=False in your groupby clause
df[m].groupby(["Date", "Time"], as_index=False, dropna=False)
but this would also put nan for the 14 and 15th which may not be what you want ... but the other answer provides a mechanism for that I think