Home > Net >  Is there an easy way to construct a pandas DataFrame from an Iterable of attrs objects?
Is there an easy way to construct a pandas DataFrame from an Iterable of attrs objects?

Time:11-30

One can do that with dataclasses like so:

from dataclasses import dataclass
import pandas as pd

@dataclass
class MyDataClass:
    i: int
    s: str


df = pd.DataFrame([MyDataClass("a", 1), MyDataClass("b", 2)])

that makes the DataFrame df with columns i and s as one would expect.

Is there an easy way to do that with an attrs class?

I can do it by iterating over the the object's properties and constructing an object of a type like dict[str, list] ({"i": [1, 2], "s": ["a", "b"]} in this case) and constructing the DataFrame from that but it would be nice to have support for attrs objects directly.

CodePudding user response:

You can access the dictionary at the heart of a dataclass like so

a = MyDataClass("a", 1)
a.__dict__

this outputs:

{'i': 'a', 's': 1}

Knowing this, if you have an iterable arr of type MyDataClass, you can access the __dict__ attribute and construct a dataframe

arr = [MyDataClass("a", 1), MyDataClass("b", 2)]
df = pd.DataFrame([x.__dict__ for x in arr])

df outputs:

   i  s
0  a  1
1  b  2

The limitation with this approach that if the slots option is used, then this will not work.

Alternatively, it is possible to convert the data from a dataclass to a tuple or dictionary using dataclasses.astuple and dataclasses.asdict respectively.

The data frame can be also constructed using either of the following:

# using astuple
df = pd.DataFrame(
  [dataclasses.astuple(x) for x in arr], 
  columns=[f.name for f in dataclasses.fields(MyDataClass)]
)

# using asdict
df = pd.DataFrame([dataclasses.asdict(x) for x in arr])
  • Related