My pandas DataFrame looks like this:
id | address |
---|---|
1 | [{'city': 'MURFREESBORO', 'line': ['9999 Candy Cane Island'], 'postalCode': '39999', 'state': '56'}] |
2 | [{'city': 'LIKELAND', 'line': ['11111 WS 80RD ST'], 'postalCode': '71398', 'state': '99'}] |
3 | [{'city': 'CHASS', 'line': ['36 LONDON LN'], 'postalCode': '269235', 'state': '35'}] |
How do I convert this column into multiple columns to look like this?
id | city | line | postalcode | state |
---|---|---|---|---|
1 | MURFREESBORO | 9999 Candy Cane Island | 39999 | 56 |
2 | LIKELAND | 11111 WS 80RD ST | 71398 | 99 |
3 | CHASS | 36 LONDON LN | 269235 | 35 |
I have tried multiple different ways:
df = pd.json_normalize(newdf['address'])
# AND
newdf['address'] = newdf['address'].apply(lambda x: "'" str(x) "'")
newdf['address'] = newdf['address'].apply(str).str.replace('[', '').str.replace(']', '')
Data to construct the DataFrame:
{'id': [1, 2, 3],
'address': [[{'city': 'MURFREESBORO', 'line': ['9999 Candy Cane Island'],
'postalCode': '39999', 'state': '56'}],
[{'city': 'LIKELAND', 'line': ['11111 WS 80RD ST'],
'postalCode': '71398', 'state': '99'}],
[{'city': 'CHASS','line': ['36 LONDON LN'],
'postalCode': '269235', 'state': '35'}]]}
CodePudding user response:
You could explode
"address" column, convert it to a list and construct a DataFrame with it; then join
it back to df
:
s = df['address'].explode()
out = df.join(pd.DataFrame(s.tolist(), index=s.index).explode('line')).drop(columns='address')
Output:
id city line postalCode state
0 1 MURFREESBORO 9999 Candy Cane Island 39999 56
1 2 LIKELAND 11111 WS 80RD ST 71398 99
2 3 CHASS 36 LONDON LN 269235 35
CodePudding user response:
First change 1-element list in each element of address column into the content of this element.
Then you can invoke json_normalize on the result of step 1 and save the result in a temporary DataFrame.
Both above operations can be performed as:
wrk = pd.json_normalize(df.address.str[0])
There is one more conversion to perform, i.e. change 1-element list in each element of line column into the content of this element:
wrk.line = wrk.line.str[0]
And the last step is to join id column with wrk:
result = df[['id']].join(wrk)
The result is:
id city line postalCode state
0 1 MURFREESBORO 9999 Candy Cane Island 39999 56
1 2 LIKELAND 11111 WS 80RD ST 71398 99
2 3 CHASS 36 LONDON LN 269235 35
The leftmost column is the index, which you didn't include in your post.
To easier comprehend how the above code works, print wrk after each step of my code.