I am trying to insert missing weekdays into a Pandas time series dataframe. The inserted weekdays must have NaN
values in every data column. When I tried the answers in Insert missing weekdays in pandas dataframe and fill them with NaN, the new rows are filled with 0
instead of NaN
. To illustrate:
import pandas as pd
df = pd.DataFrame({
'date': ['2022-10-06', '2022-10-11'], # Thursday and Tuesday.
'num': [123, 456]
})
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df = df.resample('B').sum() # Insert Friday and Monday.
However, df
is now:
num
date
2022-10-06 123
2022-10-07 0
2022-10-10 0
2022-10-11 456
Instead of NaN
, I am getting 0
. How do I get NaN
instead? This is what I want:
num
date
2022-10-06 123
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 456
(Pandas version 1.3.2, Python version 3.8.10)
CodePudding user response:
Use .asfreq()
instead of .sum()
:
df.resample('B').asfreq()
Output:
num
date
2022-10-06 123.0
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 456.0
CodePudding user response:
df = pd.DataFrame({
'date': ['2022-10-06', '2022-10-11'], # Thursday and Tuesday.
'num': [123, 456]
})
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
Use DataFrame.asfreq
if unique datetimes:
df1 = df.asfreq('B')
print (df1)
num
date
2022-10-06 123.0
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 456.0
If possible duplicates and need aggregate sum
add parameter min_count=1
:
df2 = df.resample('B').sum(min_count=1)
print (df2)
num
date
2022-10-06 123.0
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 456.0
df = pd.DataFrame({
'date': ['2022-10-06', '2022-10-11'] * 2, # Thursday and Tuesday.
'num': [123, 456, 10, 20]
})
df['date'] = pd.to_datetime(df['date'])
print (df)
date num
0 2022-10-06 123
1 2022-10-11 456
2 2022-10-06 10
3 2022-10-11 20
df = df.set_index('date')
df2 = df.resample('B').sum(min_count=1)
print (df2)
num
date
2022-10-06 133.0
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 476.0
df1 = df.asfreq('B')
print (df1)
ValueError: cannot reindex from a duplicate axis