Home > Software engineering >  Fill missing dates in a pandas DataFrame
Fill missing dates in a pandas DataFrame

Time:03-02

I’ve a lot of DataFrames with 2 columns, like this:

Fecha unidades
0 2020-01-01 2.0
84048 2020-09-01 4.0
149445 2020-10-01 11.0
532541 2020-11-01 4.0
660659 2020-12-01 2.0
1515682 2021-03-01 9.0
1563644 2021-04-01 2.0
1759823 2021-05-01 1.0
2226586 2021-07-01 1.0

As it can be seen, there are some months that are missing. Missing data depends on the DataFrame, I can have 2 months, 10, 100% complete, only one...I need to complete column "Fecha" with missing months (from 2020-01-01 to 2021-12-01) and when date is added into "Fecha", add "0" value to "unidades" column.

Each element in Fecha Column is a class 'pandas._libs.tslibs.timestamps.Timestamp

How could I fill the missing dates for each DataFrame??

CodePudding user response:

You could create a date range and use "Fecha" column to set_index reindex to add missing months. Then fillna reset_index fetches the desired outcome:

df['Fecha'] = pd.to_datetime(df['Fecha'])
df = (df.set_index('Fecha')
      .reindex(pd.date_range('2020-01-01', '2021-12-01', freq='MS'))
      .rename_axis(['Fecha'])
      .fillna(0)
      .reset_index())

Output:

        Fecha  unidades
0  2020-01-01       2.0
1  2020-02-01       0.0
2  2020-03-01       0.0
3  2020-04-01       0.0
4  2020-05-01       0.0
5  2020-06-01       0.0
6  2020-07-01       0.0
7  2020-08-01       0.0
8  2020-09-01       4.0
9  2020-10-01      11.0
10 2020-11-01       4.0
11 2020-12-01       2.0
12 2021-01-01       0.0
13 2021-02-01       0.0
14 2021-03-01       9.0
15 2021-04-01       2.0
16 2021-05-01       1.0
17 2021-06-01       0.0
18 2021-07-01       1.0
19 2021-08-01       0.0
20 2021-09-01       0.0
21 2021-10-01       0.0
22 2021-11-01       0.0
23 2021-12-01       0.0
  • Related