Home > Software engineering >  Convert sentences and labels dictionary to separate columns
Convert sentences and labels dictionary to separate columns

Time:02-15

How can I convert sentences column to multiple columns?

import pandas as pd

df = pd.DataFrame(data={'id': [0, 1, 2, 3], 'sentences': [
          {0: ['first sentence0', 'second sentence0', 'label0']}, 
          {1: ['first sentence1', 'second sentence1', 'label1']},
          {2: ['first sentence2', 'second sentence2', 'label2']},
          {3: ['first sentence3', 'second sentence3', 'label3']}]})
|    |   id | sentences                                              |
|---:|-----:|:-------------------------------------------------------|
|  0 |    0 | {0: ['first sentence0', 'second sentence0', 'label0']} |
|  1 |    1 | {1: ['first sentence1', 'second sentence1', 'label1']} |
|  2 |    2 | {2: ['first sentence2', 'second sentence2', 'label2']} |
|  3 |    3 | {3: ['first sentence3', 'second sentence3', 'label3']} |

Expected output:

|   id | sentences        | label   |
|-----:|:-----------------|:--------|
|    0 | first sentence0  | label0  |
|    0 | second sentence0 | label0  |
|    1 | first sentence1  | label1  |
|    1 | second sentence1 | label1  |
|    2 | first sentence2  | label2  |
|    2 | second sentence2 | label2  |
|    3 | first sentence3  | label3  |
|    3 | second sentence3 | label3  |

The dataframe has over 20,000 rows / 2 columns. Open for efficient solution also with loops. Maybe pd.json_normalize?

CodePudding user response:

One way could be:

from itertools import product

(df.assign(sentences=[list(product(v[-1:], v[:-1]))
                      for d in df['sentences'] for v in list(d.values())])
   .explode('sentences')
   .assign(labels=lambda d: d['sentences'].str[0],
           sentences=lambda d: d['sentences'].str[1],
          )
)

output:

   id         sentences  labels
0   0   first sentence0  label0
0   0  second sentence0  label0
1   1   first sentence1  label1
1   1  second sentence1  label1
2   2   first sentence2  label2
2   2  second sentence2  label2
3   3   first sentence3  label3
3   3  second sentence3  label3
  • Related