I have a DataFrame with the following structure:
id year name genres
238 2022 Adventure [{"revenue": 1463, "name": "culture clash", 'runtime': 150, 'vote_average': 7}]
239 2020 Comedy []
But what I need is this structure
id year name revenue name runtime vote_average
238 2022 Adventure 1463 culture clash 150 7
239 2020 Comedy
Please note that i sometimes have empty array in column genres
i used this code
(df.join(pd.json_normalize(df['genres'], record_path='genres'),
lsuffix='', rsuffix='_genres')
but it got me an error TypeError: list indices must be integers or slices, not str
Any solutions?
CodePudding user response:
You could try:
In case the genres
columns contains strings do
df["genres"] = df["genres"].map(eval)
fist. Then:
df = pd.concat(
[df[["id", "year"]], pd.DataFrame(obj[0] if obj else {} for obj in df["genres"])],
axis="columns"
)
Result for the sample:
id year revenue name runtime vote_average
0 238 2022 1463.0 culture clash 150.0 7.0
1 239 2020 NaN NaN NaN NaN
If you don't want use eval
you could try this
df["genres"] = pd.read_json("[" ", ".join(df["genres"]) "]")
df = pd.concat(
[df[["id", "year"]], pd.json_normalize(df["genres"])], axis="columns"
)
instead.
CodePudding user response:
Personally speaking, I would use iterrows
in your case:
for index, row in df.iterrows():
value = row["genres"]
if len(value) == 1:
for key, keyValue in value[0].items():
df.loc[index, key] = keyValue
df.drop(columns=["genres"], inplace=True)
df
Output
id | year | revenue | name | runtime | vote_average | |
---|---|---|---|---|---|---|
0 | 238 | 2022 | 1463 | culture clash | 150 | 7 |
1 | 239 | 2020 | nan | nan | nan | nan |