Actually i have an feather
file that im loading to an Dataframe.
And this Dataframe have a column with languages
in each row. Like the abone one:
student_id | name | created_at | languages |
---|---|---|---|
1 | Foo | 2019-01-03 14:30:32.146000 00:00 | [{'language_id': 1, 'name': 'English', 'optin_... |
2 | Bar | 2019-01-03 14:30:32.146000 00:00 | [{'language_id': 1, 'name': 'English', 'optin_... |
My question is: How can i generate an new Dataframe only with student_id
column and the rest of languages
array?
For example the above one:
student_id | language_id | language_name | optin_at |
---|---|---|---|
1 | 1 | English | 2019-01-03T14:30:32.148Z |
2 | 1 | English | 2021-05-30T00:33:02.915Z |
2 | 2 | Portuguese | 2022-03-07T07:42:07.082Z |
// EDIT:
Exported Dataframe as JSON (orient='records'
) for testing purposes:
[{"student_id":"1","name":"Foo","created_at":"2019-01-03T14:30:32.146Z","languages":[{"language_id":1,"name":"English","optin_at":"2019-01-03T14:30:32.148Z"}]},{"student_id":"2","name":"Bar","created_at":"2019-01-03T14:30:32.146Z","languages":[{"language_id":1,"name":"English","optin_at":"2021-05-30T00:33:02.915Z"},{"language_id":2,"name":"Portuguese","optin_at":"2022-03-07T07:42:07.082Z"}]}]
CodePudding user response:
You can use explode
, then convert to columns with json_normalize
:
out = (df
.explode('languages', ignore_index=True)
.pipe(lambda d: d.join(pd.json_normalize(d.pop('languages'))
.rename(columns={'name': 'language_name'})
))
)
Output:
student_id name created_at language_id language_name optin_at
0 1 Foo 2019-01-03T14:30:32.146Z 1 English 2019-01-03T14:30:32.148Z
1 2 Bar 2019-01-03T14:30:32.146Z 1 English 2021-05-30T00:33:02.915Z
2 2 Bar 2019-01-03T14:30:32.146Z 2 Portuguese 2022-03-07T07:42:07.082Z