Whenever set my DataFrame index to a list of datetime's, the Dataframe zeros out all my quantities.
df = pd.DataFrame(
{
"code": pd.Series(mem_code, dtype="pint[byte]"),
"data": pd.Series(mem_data, dtype="pint[byte]"),
"heap": pd.Series(mem_heap, dtype="pint[byte]"),
"stack": pd.Series(mem_stack, dtype="pint[byte]"),
"total": pd.Series(mem_total, dtype="pint[byte]"),
},
index=pd.DatetimeIndex(dt)
)
print(f"df: {df.pint.dequantify()}")
df: code data heap stack total
unit B B B B B
2017-01-01 12:00:00 NaN NaN NaN NaN NaN
2017-01-02 12:00:00 NaN NaN NaN NaN NaN
2017-01-03 12:00:00 NaN NaN NaN NaN NaN
I'm new to pandas and pint. My goal here is to use quantities (for units) rather than multiplying manually. In otherwords, the units should be used in the view, not be baked in to the model data.
I've been following examples such as How can I manage units in pandas data?
Nothing I've tried can get both the datetime indexes and the bytes quantities to work.
Am I doing something wrong or is pint_pandas still in its early days?
CodePudding user response:
Since you are using pd.Series
in the constructor of your DataFrame, the index of the series have to match the index of the dataframe. Your series have integer indexes (i.e. [0, 1, 2, ...]) whereas you define a datetimeindex as the df's index. None will match, so you'll get a df
of NaN
.
There many ways to address this.
You can, for instance, use a regular RangeIndex
(the default behavior)
df = pd.DataFrame(
{
"code": pd.Series(mem_code, dtype="pint[byte]"),
"data": pd.Series(mem_data, dtype="pint[byte]"),
"heap": pd.Series(mem_heap, dtype="pint[byte]"),
"stack": pd.Series(mem_stack, dtype="pint[byte]"),
"total": pd.Series(mem_total, dtype="pint[byte]"),
}
)
and overwrite the index later
df.index = pd.DateTimeIndex(dt)
You can also use lists in the dict values to avoid index-matching:
df = pd.DataFrame(
{
"code": pd.Series(mem_code, dtype="pint[byte]").tolist(),
"data": pd.Series(mem_data, dtype="pint[byte]").tolist(),
"heap": pd.Series(mem_heap, dtype="pint[byte]").tolist(),
"stack": pd.Series(mem_stack, dtype="pint[byte]").tolist(),
"total": pd.Series(mem_total, dtype="pint[byte]").tolist(),
},
index=pd.DatetimeIndex(dt)
)
Or, you can create all your series with the same indexes
index = pd.DatetimeIndex(dt)
df = pd.DataFrame(
{
"code": pd.Series(mem_code, dtype="pint[byte]", index=index),
"data": pd.Series(mem_data, dtype="pint[byte]", index=index),
"heap": pd.Series(mem_heap, dtype="pint[byte]", index=index),
"stack": pd.Series(mem_stack, dtype="pint[byte]", index=index),
"total": pd.Series(mem_total, dtype="pint[byte]", index=index),
},
index=index
)