I have a script that reads a csv into a dataframe and then allows to user to extend the dataframe by adding extra data to it. It will take the last value in the date
column and start prompting the user for a value day-by-day.
If the user doesn't specify anything for input
then the value gets cast to a math.nan
. Except when I go append the row to the dataframe, the supposed NaN
gets cast to a NaT
.
I've recreated a reproducible example below.
How do I ensure that my NaN
s aren't cast to NaT
s?
#!/usr/bin/env python
import pandas as pd
import datetime as dt
import math
df = pd.DataFrame({
'date': pd.to_datetime(['2022-05-01', '2022-05-02', '2022-05-03']),
'weight': [250., 249, 247],
})
last_recorded_date = df['date'].iloc[-1]
while True:
next_date = last_recorded_date dt.timedelta(days=1)
weight = input(f"{next_date}: ")
if weight == 'q':
break
elif weight == '':
weight = math.nan
else:
weight = float(weight)
df.loc[len(df.index)] = [next_date, weight]
last_recorded_date = next_date
print(df)
# date weight
# 0 2022-05-01 250.0
# 1 2022-05-02 249.0
# 2 2022-05-03 247.0
# 3 2022-05-04 243.0
# 4 2022-05-05 240.0
# 5 2022-05-06 NaT
CodePudding user response:
That is weird. But some experimentation reveals some clues:
df = pd.DataFrame({
'date': pd.to_datetime(['2022-05-01', '2022-05-02', '2022-05-03']),
'weight': [250., 249, 247],
})
# Try this
df.loc[4] = None
Raises:
FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
That doesn't exactly explain why it added NaT
to the second column but it does indicate that the types need to be specified when appending to the existing dataframe.
One solution, as explained here, is as follows:
df = pd.DataFrame({
'date': pd.to_datetime(['2022-05-01', '2022-05-02', '2022-05-03']),
'weight': [250., 249, 247],
})
df = df.append(pd.DataFrame([{'date': pd.NaT, 'weight': np.nan}]), ignore_index=True)
assert (df.dtypes.values == ('<M8[ns]', 'float64')).all()
However, this raises:
FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
df = df.append(pd.DataFrame([{'date': pd.NaT, 'weight': np.nan}]), ignore_index=True)
So I guess the right solution is now:
new_row = pd.DataFrame([{'date': pd.NaT, 'weight': np.nan}])
df = pd.concat([df, new_row])
assert (df.dtypes.values == ('<M8[ns]', 'float64')).all()
But I must ask, why are you appending to a dataframe in this way? It is quite inefficient and should be avoided if possible.
CodePudding user response:
import pandas as pd
import datetime as dt
import math
df = pd.DataFrame({
'date': pd.to_datetime(['2022-05-01', '2022-05-02', '2022-05-03']),
'weight': [250., 249, 247],
})
last_recorded_date = df['date'].iloc[-1]
while True:
next_date = last_recorded_date dt.timedelta(days=1)
weight = input(f"{next_date}: ")
if weight == 'q':
break
elif weight == '':
weight = math.nan
else:
weight = float(weight)
df.loc[len(df.index)] = [next_date, weight]
last_recorded_date = next_date
df = df['weight'].replace(pd.NaT, math.nan)
print(df)