Home > Software engineering >  Creating a Dataframe from another Dataframe and using DatetimeIndex fills columns with NaNs
Creating a Dataframe from another Dataframe and using DatetimeIndex fills columns with NaNs

Time:10-10

df1 = pd.DataFrame(
    {
        "Prod":  [10, 20],
        "Sales":    [1, 4],
        "DT":       ["2021-05-01 15:05:01", "2021-05-01 15:05:05"]
    },
    index=None
)

  Prod  Sales   DT
0   10    1     2021-05-01 15:05:01
1   20    4     2021-05-01 15:05:05

OK so now I create a new dataframe just using the 'Prod' column and 'DT' for the index.

df2 = pd.DataFrame(
    df1["Prod"],
    index=pd.DatetimeIndex(df1["DT"])
)

                   Prod
DT  
2021-05-01 15:05:01 NaN
2021-05-01 15:05:05 NaN

Values for 'Prod' have not been picked up and I need to convert the 'Prod' column to a list first.

df2 = pd.DataFrame(
    list(df1["Prod"]),
    columns=["Prod"],
    index=pd.DatetimeIndex(df1["DT"])
)

                   Prod
DT  
2021-05-01 15:05:01 10
2021-05-01 15:05:05 20

So what is the problem with the original code? I would have thought Pandas would have thrown an error if it was unhappy.

CodePudding user response:

The problem was that df1['Prod'] is a pandas Series with its own index (0, 1). When you specified a new index with index=pd.DatetimeIndex(df1["DT"], the old and the new indexes do not line up. Hence you got NaN.

The solution is simple:

df2 = df1[['Prod', 'DT']].set_index('DT')
  • Related