Home > other >  Pandas Dataframe from nested tuples
Pandas Dataframe from nested tuples

Time:10-05

I have some data that looks a little like this:

    data=[([('thing1',
    'thing1a'),
   ('thing1',
    'thing1b'),
   ('thing1',
    'thing1c'),
   ('thing1',
    'thing1d'),
   ('thing1',
    'thing1e')],
  'thing1description'),
 ([('thing2',
    'thing2a')],
  'thing2description'),
 ([('thing3',
 'thing3a')],
 'thing3description')]

I would like to build a dataframe that looks like this:

thing_number    thing_letter    description
thing1            thing1a   thing1description
thing1            thing1b   thing1description
thing1            thing1c   thing1description
thing1            thing1d   thing1description
thing1            thing1e   thing1description
thing2            thing2a   thing2description
thing3            thing3a   thing3description

thanks to a previous very similar question such as this I can achieve it using the below but I think I must be missing something to make this more elegant:

data_=pd.DataFrame(data,columns=['thing','description'])
data_=data_.explode('thing')
data_=pd.concat([data_,pd.DataFrame([(*i, k) for k,j in data for i in k], columns=['thing_number','thing_letter','all'],index=data_.index)],axis=1)
data_=data_[['thing_number','thing_letter','description']]

To summarise I am looking for a more efficient and elegant way to unnest the list of tuples. Thanks in advance.

CodePudding user response:

A shorter code based on the same approach:

df = (pd.DataFrame(data, columns=['thing','description'])
        .explode('thing',
                 ignore_index=True) # optional
       )

df[['thing_number','thing_letter']] = df.pop('thing').tolist()

Output:

         description thing_number thing_letter
0  thing1description       thing1      thing1a
1  thing1description       thing1      thing1b
2  thing1description       thing1      thing1c
3  thing1description       thing1      thing1d
4  thing1description       thing1      thing1e
5  thing2description       thing2      thing2a
6  thing3description       thing3      thing3a

CodePudding user response:

Another way using dict.fromkeys:

data2 = [dict.fromkeys(ks, v) for ks, v in data]
df = pd.concat([pd.Series(d) for d in data2]).reset_index()
df.columns = ['thing_number','thing_letter','description']

Output:

  thing_number thing_letter        description
0       thing1      thing1a  thing1description
1       thing1      thing1b  thing1description
2       thing1      thing1c  thing1description
3       thing1      thing1d  thing1description
4       thing1      thing1e  thing1description
5       thing2      thing2a  thing2description
6       thing3      thing3a  thing3description

CodePudding user response:

Another option, with pd.concat:

out = {key: pd.DataFrame(value) for value, key in data}
(pd
.concat(out, names = ['description', None])
.set_axis(['thing_number', 'thing_letter'], axis = 1)
.droplevel(1)
.reset_index()
)
         description thing_number thing_letter
0  thing1description       thing1      thing1a
1  thing1description       thing1      thing1b
2  thing1description       thing1      thing1c
3  thing1description       thing1      thing1d
4  thing1description       thing1      thing1e
5  thing2description       thing2      thing2a
6  thing3description       thing3      thing3a
  • Related