Home > OS >  Putting NaN when a day in a DataFrame doesn't return a value
Putting NaN when a day in a DataFrame doesn't return a value

Time:02-02

I want to get the last in the "15:30:00.0" row for every day in the data frame, but as you can see on the 16th we don't get any rows for anything from 13:00:00.0 to 15:30:00.0

My DataFrame:

             Date         Time     Open     High      Low     Last
0      2023-01-13   09:30:00.0  3968.25     3985  3965.75  3980.25
1      2023-01-13   10:00:00.0     3980   3998.5     3974     3998
2      2023-01-13   10:30:00.0     3998  4000.75  3991.25  3996.75
3      2023-01-13   11:00:00.0   3996.5     3999  3986.25  3992.75
4      2023-01-13   11:30:00.0     3993   3993.5     3985  3990.75
5      2023-01-13   12:00:00.0  3990.75  3998.75  3989.75   3997.5
6      2023-01-13   12:30:00.0   3997.5     4002     3993  3999.75
7      2023-01-13   13:00:00.0     4000  4002.25  3993.75   3997.5
8      2023-01-13   13:30:00.0  3997.25     4010  3996.25  4008.25
9      2023-01-13   14:00:00.0     4008  4010.75  4004.25  4008.75
10     2023-01-13   14:30:00.0     4009  4011.75  4006.25   4009.5
11     2023-01-13   15:00:00.0  4009.75     4016     4009     4016
12     2023-01-13   15:30:00.0     4016  4024.25  4014.75  4017.75
13     2023-01-16   09:30:00.0  4014.75  4019.25  4014.75   4017.5
14     2023-01-16   10:00:00.0  4017.75     4020   4015.5  4017.25
15     2023-01-16   10:30:00.0     4017   4020.5     4017  4018.25
16     2023-01-16   11:00:00.0     4018   4019.5  4015.75  4016.75
17     2023-01-16   11:30:00.0  4016.75     4017   4010.5     4012
18     2023-01-16   12:00:00.0  4012.25     4013  4010.75  4010.75
19     2023-01-16   12:30:00.0  4010.75     4015     4008     4010
20     2023-01-17   09:30:00.0     4018  4024.25  4008.75  4018.25
21     2023-01-17   10:00:00.0   4018.5  4035.25   4018.5  4030.25
22     2023-01-17   10:30:00.0  4030.25  4031.25   4010.5  4014.75
23     2023-01-17   11:00:00.0  4014.75  4017.25  4002.75   4009.5
24     2023-01-17   11:30:00.0  4009.25  4016.25  4008.25   4014.5
25     2023-01-17   12:00:00.0  4014.75     4019  4007.25  4008.25
26     2023-01-17   12:30:00.0   4008.5     4016  4007.75   4013.5
27     2023-01-17   13:00:00.0  4013.75   4016.5   4011.5     4014
28     2023-01-17   13:30:00.0  4014.25   4020.5  4012.75     4019
29     2023-01-17   14:00:00.0  4019.25     4021  4008.25  4010.75
30     2023-01-17   14:30:00.0     4011   4019.5  4010.75  4013.75
31     2023-01-17   15:00:00.0  4013.75  4018.25  4010.25     4012
32     2023-01-17   15:30:00.0  4011.75  4014.25  4003.75     4010

if I use this code below to try to pull the interval:

m = df["Time"].eq("15:30:00.0")

out = df[m].groupby(["Date", "Time"], as_index=False)["Last"].max()

Output:

         Date         Time     Last
0  2023-01-13   15:30:00.0  4017.75
1  2023-01-17   15:30:00.0     4010

Is it possible to put Nan or put something there so it recognizes the day but since no value just put Nan.

My desired output:

         Date         Time     Last
0  2023-01-13   15:30:00.0  4017.75
1  2023-01-16   15:30:00.0     NaN
2  2023-01-17   15:30:00.0     4010

CodePudding user response:

You can use .combine_first() with a base DF that extracts all the dates in the original DF using .unique():

base = pd.DataFrame({"Date": df["Date"].unique(), "Time": "15:30:00.0"})
base.set_index("Date").combine_first(out.set_index("Date")).reset_index()

This outputs:

         Date     Last        Time
0  2023-01-13  4017.75  15:30:00.0
1  2023-01-16      NaN  15:30:00.0
2  2023-01-17  4010.00  15:30:00.0

CodePudding user response:

you can use dropna=False in your groupby clause

df[m].groupby(["Date", "Time"], as_index=False, dropna=False)

but this would also put nan for the 14 and 15th which may not be what you want ... but the other answer provides a mechanism for that I think

  • Related