I have a Pandas DataFrame looking like:
import pandas as pd
print(pd.__version__)
df0 = pd.DataFrame([
[12, None, [{'dst': '925', 'object': 'Lok. Certification', 'admin': 'valid'},
{'dst': '935', 'object': 'Lok. Administration', 'admin': 'true'},
{'dst': '944', 'object': 'Lok. Customer', 'admin': 'false'},
{'dst': '945', 'object': 'Lok. Customer', 'admin': 'false'},
{'dst': '954', 'object': 'Lok. Certification-C', 'admin': 'invalid'},
{'dst': '956', 'object': 'Lok. Certification', 'admin': 'valid'}]],
[13,'wXB', [{'dst': '986', 'object': 'Fral_heater', 'admin': 'valid'},
{'dst': '987', 'object': 'Fral_cond.', 'admin': 'valid'}]],
])
Each of the list in the column 2 is having the exact same keys (dst
, object
and admin
).
There can be between 0 (empty []
) and 100 lists for each row of the df0
.
I wish I could expand the df0
DataFrame to look like this:
columns = ['id', 'name', 'dst', 'object', 'admin']
df_wanted
Out[416]:
id name dst object admin
12 None 925 'Lok. Certification' 'valid'
12 None 935 'Lok. Administration' 'true'
12 None 944 'Lok. Customer' 'false'
12 None 945 'Lok. Customer' 'false'
12 None 955 'Lok. Certification-C' 'invalid'
12 None 956 'Lok. Certification' 'valid'
13 wXB 987 'Lok. Fral_heater' 'valid'
13 wXB 986 'Lok. Fral_cond.' 'valid'
...
Notice that the two first columns, id
and name
, are replicated along the rows to fit the number of elements within their list.
(The dst
column must be cast to an int
using .astype(int)
at the end.)
How could I achieve that?
Info:
Python 3.10.4
pd.__version__
'1.4.2'
CodePudding user response:
You can explode
the column first, then convert the dictionaries to columns:
df0 = df0.explode(2, ignore_index=True)
df0 = pd.concat([df0, df0[2].apply(pd.Series)], axis=1).drop(columns=2)
CodePudding user response:
The other answers both work well. Instead of building Series objects line by line, you could build a single DataFrame object and join
back to the original. This one should be a bit faster.
df1 = df0.explode(2, ignore_index=True).pipe(lambda x: x.join(pd.DataFrame(x.pop(2).tolist())))
Output:
0 1 dst object admin
0 12 None 925 Lok. Certification valid
1 12 None 935 Lok. Administration true
2 12 None 944 Lok. Customer false
3 12 None 945 Lok. Customer false
4 12 None 954 Lok. Certification-C invalid
5 12 None 956 Lok. Certification valid
6 13 wXB 986 Fral_heater valid
7 13 wXB 987 Fral_cond. valid
Benchmark:
>>> %timeit df1 = df0.explode(2, ignore_index=True); df1 = pd.concat([df1, df1[2].apply(pd.Series)], axis=1).drop(columns=2)
8.4 ms ± 842 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit df1 = df0.explode(2, ignore_index=True).pipe(lambda x: x.join(pd.DataFrame(x.pop(2).tolist())))
4.8 ms ± 565 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
CodePudding user response:
I'd recommend expanding the column into a new df, then joining back to the main df:
# Copied from above
df = pd.DataFrame([
[12, None, [{'dst': '925', 'object': 'Lok. Certification', 'admin': 'valid'},
{'dst': '935', 'object': 'Lok. Administration', 'admin': 'true'},
{'dst': '944', 'object': 'Lok. Customer', 'admin': 'false'},
{'dst': '945', 'object': 'Lok. Customer', 'admin': 'false'},
{'dst': '954', 'object': 'Lok. Certification-C', 'admin': 'invalid'},
{'dst': '956', 'object': 'Lok. Certification', 'admin': 'valid'}]],
[13,'wXB', [{'dst': '986', 'object': 'Fral_heater', 'admin': 'valid'},
{'dst': '987', 'object': 'Fral_cond.', 'admin': 'valid'}]],
])
# Set the names of the columns
df.columns = ['id', 'name', 'object']
# Create a new df from the column
df_tmp = df['object'].explode().apply(pd.Series)
# Join to original
df = pd.concat([df[['id', 'name']], df_tmp], axis=1).reset_index(drop=True)
# Result:
| | id | name | dst | object | admin |
|---:|-----:|:-------|------:|:---------------------|:--------|
| 0 | 12 | | 925 | Lok. Certification | valid |
| 1 | 12 | | 935 | Lok. Administration | true |
| 2 | 12 | | 944 | Lok. Customer | false |
| 3 | 12 | | 945 | Lok. Customer | false |
| 4 | 12 | | 954 | Lok. Certification-C | invalid |
| 5 | 12 | | 956 | Lok. Certification | valid |
| 6 | 13 | wXB | 986 | Fral_heater | valid |
| 7 | 13 | wXB | 987 | Fral_cond. | valid |