I am new to stackoverflow. I hope I can formulate my question clearly.
I am using reindex
to fill out missing dates in a pandas dataframe:
df = pd.read_csv('myfile.dat', skiprows=1)
print(df)
output:
TIME A B C D
0 2022-04-28 00:02:00 0 2 1 5
1 2022-04-28 00:03:00 0 2 2 5
2 2022-04-28 00:05:00 0 2 3 5
3 2022-04-28 00:06:00 0 2 4 5
4 2022-04-28 00:09:00 0 2 5 5
5 2022-04-28 00:10:00 0 2 6 5
6 2022-04-28 00:12:00 0 2 8 5
7 2022-04-28 00:15:00 0 2 10 5
The doing:
#Change data type to datetime
date_format = '%Y-%m-%d %H:%M:%S'
df['TIME'] = pd.to_datetime(df['TIME'], format=date_format)
#define index and round it (The math. floor() method rounds a number DOWN to the nearest integer)
idx = pd.date_range(start='2022-04-28 00:00:00', end='2022-04-28 00:15:00', freq='60S').floor('60S')
#Set index on 'TIME' from 'df'
df = df.set_index('TIME')
#Use 'resample()' as a convenience method for frequency conversion and resampling of time series
df = df.resample('60S').sum()
#Reindex and setting new values to 0
df = df.reindex(idx, fill_value=1000)
print(df)
Where the ouput is:
A B C D
2022-04-28 00:00:00 1000 1000 1000 1000
2022-04-28 00:01:00 1000 1000 1000 1000
2022-04-28 00:02:00 0 2 1 5
2022-04-28 00:03:00 0 2 2 5
2022-04-28 00:04:00 0 0 0 0
2022-04-28 00:05:00 0 2 3 5
2022-04-28 00:06:00 0 2 4 5
2022-04-28 00:07:00 0 0 0 0
2022-04-28 00:08:00 0 0 0 0
2022-04-28 00:09:00 0 2 5 5
2022-04-28 00:10:00 0 2 6 5
2022-04-28 00:11:00 0 0 0 0
2022-04-28 00:12:00 0 2 8 5
2022-04-28 00:13:00 0 0 0 0
2022-04-28 00:14:00 0 0 0 0
2022-04-28 00:15:00 0 2 10 5
My question is: Why does reindex
creates new dates (as it should) but only sets the value of the first two rows to 1000 instead of all new rows?
Thanks for every help!
CodePudding user response:
Why does reindex creates new dates (as it should) but only sets the value of the first two rows to 1000 instead of all new rows?
Because fill_value parameter of the reindex is the value to use for missing values. Defaults to NaN, but can be any “compatible” value.
I suggest that you just remove the fill_value=1000 and simply assign 1000 to all columns after reindexing.
CodePudding user response:
If you have a closer look, you will see, after resampling your df
the index range is from 02:00
to 15:00
but your created idx
has a range from 0:00
to 15:00
. The only missing values when reindexing are the first two rows, that's why only these two rows get filled with your defined fill_value