I have a Pandas DataFrame constructed from a dict with a nan (e.g.: float("nan")
). When I use .to_dict
on it, I get a different dict - the nan values is something "else".
Is it possible to know what this new nan value is?
Here is a toy example I created, and a bunch of checks I did:
import numpy as np
import pandas as pd
a_dict = {
"a": (1, 2),
"b": (3, float("nan")),
}
df = pd.DataFrame(a_dict)
print(df.to_dict())
# {'a': {0: 1, 1: 2}, 'b': {0: 3.0, 1: nan}}
# to_dict() gives a different dict:
print(a_dict == a_dict) # True
print(df.to_dict == a_dict) # False
print(df.to_dict()["b"][1]) # nan
print(type(df.to_dict()["b"][1])) # <class 'float'>
print(df.to_dict()["b"][1] == float("nan")) # False
print(df.to_dict()["b"][1] == np.nan) # False
print(df.to_dict()["b"][1] == pd.NA) # False
print(df.to_dict()["b"][1] is None) # False
print(np.isnan(df.to_dict()["b"][1])) # True
print(pd.isna(df.to_dict()["b"][1])) # True
In terms of motivation, this is biting me when I try to create tests using unittest.TestCase.assertEqual
Thanks upfront.
Related but didn't help:
CodePudding user response:
May be not the best way but this is how you can check for testing only
import pandas as pd
import numpy as np
from collections import defaultdict
from functools import partial
class custom_dict(dict):
def __eq__(self, __o: object) -> bool:
if isinstance(__o, dict):
return self.keys() == __o.keys() and all(list(self[k1]) in (list(__o[k1]),) for k1 in self.keys())
return False
a_dict = {
"a": (1, 2),
"b": (3, np.nan),
}
df = pd.DataFrame(a_dict, dtype=object)
print(df.to_dict('list',into=custom_dict))
print(a_dict)
print(df.to_dict('list', into=custom_dict)["b"][1] in (np.nan, )) # true
print(df.to_dict('list', into=custom_dict) == a_dict). # true
CodePudding user response:
As you stated to_dict() gives a different dict, but it is not related to the nan
value.
df.to_dict()
yields {'a': {0: 1, 1: 2}, 'b': {0: 3.0, 1: nan}}
and not {'a': (1, 2), 'b': (3, nan)}
, so it is not equal.
Replace the nan
in a_dict
with a number (e.g. 4
) and df.to_dict == a_dict
will still evaluate to False
, so the nan
is not your problem.
I would like to point out that np.nan == np.nan
evaluates to False
. The fact that a_dict == a_dict
evaluates to True
is due to the definition of 'equal': Equal means that both dictionaries have the same keys and the keys refer to the same object or if the are equal.
See here for more info.
To solve your initial question "How to get the same dict from a Pandas.DataFrame.to_dict?" see here. It is a pain with the tuples you have in your dict and pandas automatically setting the datatype, which makes the code below fail.
Basically you could do
d = df.to_dict('list')
{i: tuple(d[i]) for i in d.keys()} == a_dict # True