df1 = pd.DataFrame(
{
"Prod": [10, 20],
"Sales": [1, 4],
"DT": ["2021-05-01 15:05:01", "2021-05-01 15:05:05"]
},
index=None
)
Prod Sales DT
0 10 1 2021-05-01 15:05:01
1 20 4 2021-05-01 15:05:05
OK so now I create a new dataframe just using the 'Prod' column and 'DT' for the index.
df2 = pd.DataFrame(
df1["Prod"],
index=pd.DatetimeIndex(df1["DT"])
)
Prod
DT
2021-05-01 15:05:01 NaN
2021-05-01 15:05:05 NaN
Values for 'Prod' have not been picked up and I need to convert the 'Prod' column to a list first.
df2 = pd.DataFrame(
list(df1["Prod"]),
columns=["Prod"],
index=pd.DatetimeIndex(df1["DT"])
)
Prod
DT
2021-05-01 15:05:01 10
2021-05-01 15:05:05 20
So what is the problem with the original code? I would have thought Pandas would have thrown an error if it was unhappy.
CodePudding user response:
The problem was that df1['Prod']
is a pandas Series with its own index (0
, 1
). When you specified a new index with index=pd.DatetimeIndex(df1["DT"]
, the old and the new indexes do not line up. Hence you got NaN
.
The solution is simple:
df2 = df1[['Prod', 'DT']].set_index('DT')