I have some data that looks a little like this:
data=[([('thing1',
'thing1a'),
('thing1',
'thing1b'),
('thing1',
'thing1c'),
('thing1',
'thing1d'),
('thing1',
'thing1e')],
'thing1description'),
([('thing2',
'thing2a')],
'thing2description'),
([('thing3',
'thing3a')],
'thing3description')]
I would like to build a dataframe that looks like this:
thing_number thing_letter description
thing1 thing1a thing1description
thing1 thing1b thing1description
thing1 thing1c thing1description
thing1 thing1d thing1description
thing1 thing1e thing1description
thing2 thing2a thing2description
thing3 thing3a thing3description
thanks to a previous very similar question such as this I can achieve it using the below but I think I must be missing something to make this more elegant:
data_=pd.DataFrame(data,columns=['thing','description'])
data_=data_.explode('thing')
data_=pd.concat([data_,pd.DataFrame([(*i, k) for k,j in data for i in k], columns=['thing_number','thing_letter','all'],index=data_.index)],axis=1)
data_=data_[['thing_number','thing_letter','description']]
To summarise I am looking for a more efficient and elegant way to unnest the list of tuples. Thanks in advance.
CodePudding user response:
A shorter code based on the same approach:
df = (pd.DataFrame(data, columns=['thing','description'])
.explode('thing',
ignore_index=True) # optional
)
df[['thing_number','thing_letter']] = df.pop('thing').tolist()
Output:
description thing_number thing_letter
0 thing1description thing1 thing1a
1 thing1description thing1 thing1b
2 thing1description thing1 thing1c
3 thing1description thing1 thing1d
4 thing1description thing1 thing1e
5 thing2description thing2 thing2a
6 thing3description thing3 thing3a
CodePudding user response:
Another way using dict.fromkeys
:
data2 = [dict.fromkeys(ks, v) for ks, v in data]
df = pd.concat([pd.Series(d) for d in data2]).reset_index()
df.columns = ['thing_number','thing_letter','description']
Output:
thing_number thing_letter description
0 thing1 thing1a thing1description
1 thing1 thing1b thing1description
2 thing1 thing1c thing1description
3 thing1 thing1d thing1description
4 thing1 thing1e thing1description
5 thing2 thing2a thing2description
6 thing3 thing3a thing3description
CodePudding user response:
Another option, with pd.concat
:
out = {key: pd.DataFrame(value) for value, key in data}
(pd
.concat(out, names = ['description', None])
.set_axis(['thing_number', 'thing_letter'], axis = 1)
.droplevel(1)
.reset_index()
)
description thing_number thing_letter
0 thing1description thing1 thing1a
1 thing1description thing1 thing1b
2 thing1description thing1 thing1c
3 thing1description thing1 thing1d
4 thing1description thing1 thing1e
5 thing2description thing2 thing2a
6 thing3description thing3 thing3a