Converting dictionary of list of objects to pandas dataframe-CodePudding

I have a file that is stored with the following organizational format:

Dictionary
    List
        Object
            Attribute

Specifically looking like this:

dict = {
'0': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
'1': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
'2': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
'3': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
}

With the TestObject object being defined as:

import random

class TestObject:
    def __init__(self):
        self.id = random.randint()
        self.date = random.randint()
        self.size = random.randint()

The attributes in this example do not really matter and are just placeholders. What I am concerned with is converting this data format to be a dataframe. Specifically, I want to organize the data to resemble the following format:

|key|  object  |  id  | date | size |
|-- |  ------  | ---- | ---- | ---- |
| 0 |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
| 1 |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
| 2 |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
| 3 |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |

I found this method for converting a dictionary of lists to a dataframe:

pandas.DataFrame.from_dict(dictionary)

but in this case I am interested in extracting attributes from objects which are stored in the lists.

CodePudding user response：

You can use a list comprehension:

pd.DataFrame([(k, o, o.id, o.date, o.size)
              for k, l in dic.items() for o in l],
             columns=['key', 'object', 'id', 'date', 'size']
            )

You first need to fix a few things in your initial code:

import random

class TestObject:
    def __init__(self):
        self.id = random.randint(0,1)   # randint has 2 mandatory parameters
        self.date = random.randint(0,1) #
        self.size = random.randint(0,1) #

# better use "dic", "dict" is a python builtin
dic = {
'0': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()],
'1': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()],
'2': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()],
'3': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
}

Example output:

   key                                          object  id  date  size
0    0  <__main__.TestObject object at 0x7fc79371af10>   0     0     0
1    0  <__main__.TestObject object at 0x7fc79371aeb0>   1     0     1
2    0  <__main__.TestObject object at 0x7fc79371af70>   1     0     0
3    0  <__main__.TestObject object at 0x7fc79371c040>   1     0     1
4    0  <__main__.TestObject object at 0x7fc79371c0d0>   1     1     0
5    1  <__main__.TestObject object at 0x7fc79371c220>   1     1     1
6    1  <__main__.TestObject object at 0x7fc79371c1c0>   1     1     0
7    1  <__main__.TestObject object at 0x7fc79371c310>   0     1     0
8    1  <__main__.TestObject object at 0x7fc79371c400>   0     1     0
9    1  <__main__.TestObject object at 0x7fc79371c370>   0     0     1
10   2  <__main__.TestObject object at 0x7fc79371c4f0>   1     1     0
11   2  <__main__.TestObject object at 0x7fc79371c490>   0     0     1
12   2  <__main__.TestObject object at 0x7fc79371c5e0>   1     0     0
13   2  <__main__.TestObject object at 0x7fc79371c580>   1     0     1
14   2  <__main__.TestObject object at 0x7fc79371c640>   0     1     1
15   3  <__main__.TestObject object at 0x7fc79371c3d0>   0     1     1
16   3  <__main__.TestObject object at 0x7fc79371c730>   1     1     1
17   3  <__main__.TestObject object at 0x7fc79371c880>   1     0     1
18   3  <__main__.TestObject object at 0x7fc79371c850>   0     1     0
19   3  <__main__.TestObject object at 0x7fc79371c9a0>   0     1     1

CodePudding user response：

In python each object hold a __dict__ attribute which lists all attributes and their values:

print(pd.DataFrame(TestObject().__dict__, index=[0]))

 id  date  size
0   0     0     0

Using a dict comprehension, you can easily acheive your goal without having to specify all your attribute needed and add the object column with the name of the class:

not_nested_dict = {(key, n): {'object': obj.__class__.__name__, **obj.__dict__} for key, value in dict_example.items() for n, obj in enumerate(value)}

print(not_nested_dict)

{('0', 0): {'object': 'TestObject', 'id': 1, 'date': 0, 'size': 0}, ('0', 1): {'object': 'TestObject', 'id': 0, 'date': 1, 'size': 0}, ...

Then just call pd.DataFrame with your new dict and transpose it:

print(pd.DataFrame.from_dict(not_nested_dict).T)

      object id date size
0 0  TestObject  0    1    1
  1  TestObject  0    0    0
  2  TestObject  0    0    0
  3  TestObject  0    0    0
  4  TestObject  1    1    1
1 0  TestObject  0    0    0
  1  TestObject  0    1    1
  2  TestObject  0    0    1
  3  TestObject  0    1    1
  4  TestObject  1    0    0
2 0  TestObject  1    0    0
  1  TestObject  0    1    0
  2  TestObject  1    0    1
  3  TestObject  1    0    0
  4  TestObject  1    1    1
3 0  TestObject  0    0    0
  1  TestObject  1    1    1
  2  TestObject  1    0    0
  3  TestObject  1    1    1
  4  TestObject  1    1    0