Tryng to unpack (what I believe to be called) a list of multi-nested dictionaries to achieve the following output:
subject_code module notes topics start end faculty_id faculty_leader
0 90 ABC Board Spring 19 refer to guidance: https://nothing.com/general... [1A, 2G, 3S] 1643847382 1644279382 nothing Steve Finn
1 80 ABC Board Spring 18 refer to guidance: https://nothing.com/general... [1D, 5F, 2B] 1643847382 1644279382 nothing Steve Finn
Below is the raw data and my unsuccessful attempts to achieve the aforementioned output:
import pandas as pd
data_ = [
{
"subject_code": 90, "module": "ABC Board Spring 19",
"notes" : "refer to guidance: https://nothing.com/general-information/overleaf",
"topics" : ['1A', '2G', '3S'], "start":1643847382,"end":1644279382,
"faculty":{"id":"nothing","leader":"Steve Finn"}
},
{
"subject_code": 80, "module": "ABC Board Spring 18",
"notes" : "refer to guidance: https://nothing.com/general-information/overleaf",
"topics" : ['1D', '5F', '2B'], "start":1643847382,"end":1644279382,
"faculty":{"id":"nothing","leader":"Steve Finn"}}]
df = pd.DataFrame(data_ )
print(df)
However, the final column faculty
is not displaying as desired....
subject_code module notes topics start end faculty
0 90 ABC Board Spring 19 refer to guidance: https://nothing.com/general... [1A, 2G, 3S] 1643847382 1644279382 {'id': 'nothing', 'leader': 'Steve Finn'}
1 80 ABC Board Spring 18 refer to guidance: https://nothing.com/general... [1D, 5F, 2B] 1643847382 1644279382 {'id': 'nothing', 'leader': 'Steve Finn'}
Rather than faculty
displaying as a single column containing a dictionary of data I would like split into individual columns, named as dictionary name and key...
Below is the desired output as shown at the start of this thread......
subject_code module notes topics start end faculty_id faculty_leader
0 90 ABC Board Spring 19 refer to guidance: https://nothing.com/general... [1A, 2G, 3S] 1643847382 1644279382 nothing Steve Finn
1 80 ABC Board Spring 18 refer to guidance: https://nothing.com/general... [1D, 5F, 2B] 1643847382 1644279382 nothing Steve Finn
CodePudding user response:
You could use the DataFrame constructor on "faculty" column and join
the resulting DataFrame back to df
(with add_prefix
, you could add "faculty"):
out = df.join(pd.DataFrame(df['faculty'].tolist()).add_prefix('faculty_')).drop(columns='faculty')
Output:
subject_code module \
0 90 ABC Board Spring 19
1 80 ABC Board Spring 18
notes topics \
0 refer to guidance: https://nothing.com/general... [1A, 2G, 3S]
1 refer to guidance: https://nothing.com/general... [1D, 5F, 2B]
start end faculty_id faculty_leader
0 1643847382 1644279382 nothing Steve Finn
1 1643847382 1644279382 nothing Steve Finn
CodePudding user response:
You can just use json_normalize
. Renaming the columns to the desired names is an exercise I leave for the reader.
pd.json_normalize(data_)
subject_code module notes topics start end faculty.id faculty.leader
0 90 ABC Board Spring 19 refer to guidance: https://nothing.com/general... [1A, 2G, 3S] 1643847382 1644279382 nothing Steve Finn
1 80 ABC Board Spring 18 refer to guidance: https://nothing.com/general... [1D, 5F, 2B] 1643847382 1644279382 nothing Steve Finn
CodePudding user response:
You can flatten "faculty" before passing to the DataFrame constructor
For example:
for d in data_:
faculty = d.pop("faculty")
d["faculty_id"] = faculty["id"]
d["faculty_leader"] = faculty["leader"]
then pass data_ to the DataFrame constructor