How to get the same dict from a Pandas.DataFrame.to_dict when it has `nan`?-CodePudding

I have a Pandas DataFrame constructed from a dict with a nan (e.g.: float("nan")). When I use .to_dict on it, I get a different dict - the nan values is something "else".

Is it possible to know what this new nan value is?

Here is a toy example I created, and a bunch of checks I did:

import numpy as np
import pandas as pd

a_dict = {
            "a": (1, 2),
            "b": (3, float("nan")),
        }
df = pd.DataFrame(a_dict)

print(df.to_dict())
# {'a': {0: 1, 1: 2}, 'b': {0: 3.0, 1: nan}}

# to_dict() gives a different dict:
print(a_dict == a_dict) # True
print(df.to_dict == a_dict)  # False

print(df.to_dict()["b"][1]) # nan
print(type(df.to_dict()["b"][1])) # <class 'float'>


print(df.to_dict()["b"][1] == float("nan"))  # False
print(df.to_dict()["b"][1] == np.nan)  # False
print(df.to_dict()["b"][1] == pd.NA)  # False
print(df.to_dict()["b"][1] is None)  # False
print(np.isnan(df.to_dict()["b"][1]))  # True
print(pd.isna(df.to_dict()["b"][1]))  # True

In terms of motivation, this is biting me when I try to create tests using unittest.TestCase.assertEqual

Thanks upfront.

Related but didn't help:

CodePudding user response：

May be not the best way but this is how you can check for testing only

import pandas as pd
import numpy as np
from collections import defaultdict
from functools import partial

class custom_dict(dict):
    def __eq__(self, __o: object) -> bool:
        if isinstance(__o, dict):
            return self.keys() == __o.keys() and all(list(self[k1]) in (list(__o[k1]),) for k1 in self.keys())
        return False

a_dict = {
            "a": (1, 2),
            "b": (3, np.nan),
        }
df = pd.DataFrame(a_dict, dtype=object)
print(df.to_dict('list',into=custom_dict))
print(a_dict)
print(df.to_dict('list', into=custom_dict)["b"][1] in  (np.nan, )) # true
print(df.to_dict('list', into=custom_dict) == a_dict). # true

CodePudding user response：

As you stated to_dict() gives a different dict, but it is not related to the nan value.
df.to_dict() yields {'a': {0: 1, 1: 2}, 'b': {0: 3.0, 1: nan}} and not {'a': (1, 2), 'b': (3, nan)}, so it is not equal. Replace the nan in a_dict with a number (e.g. 4) and df.to_dict == a_dict will still evaluate to False, so the nan is not your problem.

I would like to point out that np.nan == np.nan evaluates to False. The fact that a_dict == a_dict evaluates to True is due to the definition of 'equal': Equal means that both dictionaries have the same keys and the keys refer to the same object or if the are equal. See here for more info.

To solve your initial question "How to get the same dict from a Pandas.DataFrame.to_dict?" see here. It is a pain with the tuples you have in your dict and pandas automatically setting the datatype, which makes the code below fail.

~~Basically you could do~~

~~d = df.to_dict('list') {i: tuple(d[i]) for i in d.keys()} == a_dict # True~~