I have a file that is stored with the following organizational format:
Dictionary
List
Object
Attribute
Specifically looking like this:
dict = {
'0': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
'1': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
'2': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
'3': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
}
With the TestObject object being defined as:
import random
class TestObject:
def __init__(self):
self.id = random.randint()
self.date = random.randint()
self.size = random.randint()
The attributes in this example do not really matter and are just placeholders. What I am concerned with is converting this data format to be a dataframe. Specifically, I want to organize the data to resemble the following format:
|key| object | id | date | size |
|-- | ------ | ---- | ---- | ---- |
| 0 |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| 1 |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| 2 |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| 3 |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
I found this method for converting a dictionary of lists to a dataframe:
pandas.DataFrame.from_dict(dictionary)
but in this case I am interested in extracting attributes from objects which are stored in the lists.
CodePudding user response:
You can use a list comprehension:
pd.DataFrame([(k, o, o.id, o.date, o.size)
for k, l in dic.items() for o in l],
columns=['key', 'object', 'id', 'date', 'size']
)
You first need to fix a few things in your initial code:
import random
class TestObject:
def __init__(self):
self.id = random.randint(0,1) # randint has 2 mandatory parameters
self.date = random.randint(0,1) #
self.size = random.randint(0,1) #
# better use "dic", "dict" is a python builtin
dic = {
'0': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()],
'1': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()],
'2': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()],
'3': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
}
Example output:
key object id date size
0 0 <__main__.TestObject object at 0x7fc79371af10> 0 0 0
1 0 <__main__.TestObject object at 0x7fc79371aeb0> 1 0 1
2 0 <__main__.TestObject object at 0x7fc79371af70> 1 0 0
3 0 <__main__.TestObject object at 0x7fc79371c040> 1 0 1
4 0 <__main__.TestObject object at 0x7fc79371c0d0> 1 1 0
5 1 <__main__.TestObject object at 0x7fc79371c220> 1 1 1
6 1 <__main__.TestObject object at 0x7fc79371c1c0> 1 1 0
7 1 <__main__.TestObject object at 0x7fc79371c310> 0 1 0
8 1 <__main__.TestObject object at 0x7fc79371c400> 0 1 0
9 1 <__main__.TestObject object at 0x7fc79371c370> 0 0 1
10 2 <__main__.TestObject object at 0x7fc79371c4f0> 1 1 0
11 2 <__main__.TestObject object at 0x7fc79371c490> 0 0 1
12 2 <__main__.TestObject object at 0x7fc79371c5e0> 1 0 0
13 2 <__main__.TestObject object at 0x7fc79371c580> 1 0 1
14 2 <__main__.TestObject object at 0x7fc79371c640> 0 1 1
15 3 <__main__.TestObject object at 0x7fc79371c3d0> 0 1 1
16 3 <__main__.TestObject object at 0x7fc79371c730> 1 1 1
17 3 <__main__.TestObject object at 0x7fc79371c880> 1 0 1
18 3 <__main__.TestObject object at 0x7fc79371c850> 0 1 0
19 3 <__main__.TestObject object at 0x7fc79371c9a0> 0 1 1
CodePudding user response:
In python each object hold a __dict__
attribute which lists all attributes and their values:
print(pd.DataFrame(TestObject().__dict__, index=[0]))
id date size
0 0 0 0
Using a dict comprehension, you can easily acheive your goal without having to specify all your attribute needed and add the object column with the name of the class:
not_nested_dict = {(key, n): {'object': obj.__class__.__name__, **obj.__dict__} for key, value in dict_example.items() for n, obj in enumerate(value)}
print(not_nested_dict)
{('0', 0): {'object': 'TestObject', 'id': 1, 'date': 0, 'size': 0}, ('0', 1): {'object': 'TestObject', 'id': 0, 'date': 1, 'size': 0}, ...
Then just call pd.DataFrame with your new dict and transpose it:
print(pd.DataFrame.from_dict(not_nested_dict).T)
object id date size
0 0 TestObject 0 1 1
1 TestObject 0 0 0
2 TestObject 0 0 0
3 TestObject 0 0 0
4 TestObject 1 1 1
1 0 TestObject 0 0 0
1 TestObject 0 1 1
2 TestObject 0 0 1
3 TestObject 0 1 1
4 TestObject 1 0 0
2 0 TestObject 1 0 0
1 TestObject 0 1 0
2 TestObject 1 0 1
3 TestObject 1 0 0
4 TestObject 1 1 1
3 0 TestObject 0 0 0
1 TestObject 1 1 1
2 TestObject 1 0 0
3 TestObject 1 1 1
4 TestObject 1 1 0