Home > Blockchain >  How do I convert this column into multiple columns?
How do I convert this column into multiple columns?

Time:02-18

My pandas DataFrame looks like this:

id address
1 [{'city': 'MURFREESBORO', 'line': ['9999 Candy Cane Island'], 'postalCode': '39999', 'state': '56'}]
2 [{'city': 'LIKELAND', 'line': ['11111 WS 80RD ST'], 'postalCode': '71398', 'state': '99'}]
3 [{'city': 'CHASS', 'line': ['36 LONDON LN'], 'postalCode': '269235', 'state': '35'}]

How do I convert this column into multiple columns to look like this?

id city line postalcode state
1 MURFREESBORO 9999 Candy Cane Island 39999 56
2 LIKELAND 11111 WS 80RD ST 71398 99
3 CHASS 36 LONDON LN 269235 35

I have tried multiple different ways:

df = pd.json_normalize(newdf['address'])

# AND
newdf['address'] = newdf['address'].apply(lambda x: "'"   str(x)   "'")
newdf['address'] = newdf['address'].apply(str).str.replace('[', '').str.replace(']', '')

Data to construct the DataFrame:

{'id': [1, 2, 3],
 'address': [[{'city': 'MURFREESBORO', 'line': ['9999 Candy Cane Island'],
               'postalCode': '39999', 'state': '56'}],
             [{'city': 'LIKELAND', 'line': ['11111 WS 80RD ST'],
               'postalCode': '71398', 'state': '99'}],
             [{'city': 'CHASS','line': ['36 LONDON LN'],
               'postalCode': '269235', 'state': '35'}]]}

CodePudding user response:

You could explode "address" column, convert it to a list and construct a DataFrame with it; then join it back to df:

s = df['address'].explode()
out = df.join(pd.DataFrame(s.tolist(), index=s.index).explode('line')).drop(columns='address')

Output:

   id          city                    line postalCode state
0   1  MURFREESBORO  9999 Candy Cane Island      39999    56
1   2      LIKELAND        11111 WS 80RD ST      71398    99
2   3         CHASS            36 LONDON LN     269235    35

CodePudding user response:

First change 1-element list in each element of address column into the content of this element.

Then you can invoke json_normalize on the result of step 1 and save the result in a temporary DataFrame.

Both above operations can be performed as:

wrk = pd.json_normalize(df.address.str[0])

There is one more conversion to perform, i.e. change 1-element list in each element of line column into the content of this element:

wrk.line = wrk.line.str[0]

And the last step is to join id column with wrk:

result = df[['id']].join(wrk)

The result is:

   id          city                    line postalCode state
0   1  MURFREESBORO  9999 Candy Cane Island      39999    56
1   2      LIKELAND        11111 WS 80RD ST      71398    99
2   3         CHASS            36 LONDON LN     269235    35

The leftmost column is the index, which you didn't include in your post.

To easier comprehend how the above code works, print wrk after each step of my code.

  • Related