Home > Mobile >  How to get NaN values when inserting missing weekdays into Pandas DataFrame
How to get NaN values when inserting missing weekdays into Pandas DataFrame

Time:10-11

I am trying to insert missing weekdays into a Pandas time series dataframe. The inserted weekdays must have NaN values in every data column. When I tried the answers in Insert missing weekdays in pandas dataframe and fill them with NaN, the new rows are filled with 0 instead of NaN. To illustrate:

import pandas as pd

df = pd.DataFrame({
    'date': ['2022-10-06', '2022-10-11'],  # Thursday and Tuesday.
    'num':  [123, 456]
})
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df = df.resample('B').sum()  # Insert Friday and Monday.

However, df is now:

            num
date           
2022-10-06  123
2022-10-07    0
2022-10-10    0
2022-10-11  456

Instead of NaN, I am getting 0. How do I get NaN instead? This is what I want:

            num
date           
2022-10-06  123
2022-10-07  NaN
2022-10-10  NaN
2022-10-11  456

(Pandas version 1.3.2, Python version 3.8.10)

CodePudding user response:

Use .asfreq() instead of .sum():

df.resample('B').asfreq()

Output:

              num
date             
2022-10-06  123.0
2022-10-07    NaN
2022-10-10    NaN
2022-10-11  456.0

CodePudding user response:

df = pd.DataFrame({
    'date': ['2022-10-06', '2022-10-11'],  # Thursday and Tuesday.
    'num':  [123, 456]
})
df['date'] = pd.to_datetime(df['date'])
    

df = df.set_index('date')

Use DataFrame.asfreq if unique datetimes:

df1 = df.asfreq('B')
print (df1)
              num
date             
2022-10-06  123.0
2022-10-07    NaN
2022-10-10    NaN
2022-10-11  456.0

If possible duplicates and need aggregate sum add parameter min_count=1:

df2 = df.resample('B').sum(min_count=1)
print (df2)
              num
date             
2022-10-06  123.0
2022-10-07    NaN
2022-10-10    NaN
2022-10-11  456.0

df = pd.DataFrame({
    'date': ['2022-10-06', '2022-10-11'] * 2,  # Thursday and Tuesday.
    'num':  [123, 456, 10, 20]
})
df['date'] = pd.to_datetime(df['date'])
print (df)
        date  num
0 2022-10-06  123
1 2022-10-11  456
2 2022-10-06   10
3 2022-10-11   20

df = df.set_index('date')


df2 = df.resample('B').sum(min_count=1)
print (df2)
              num
date             
2022-10-06  133.0
2022-10-07    NaN
2022-10-10    NaN
2022-10-11  476.0

df1 = df.asfreq('B')
print (df1)

ValueError: cannot reindex from a duplicate axis

  • Related