Home > Mobile >  extract items from column with pandas
extract items from column with pandas

Time:03-03

I have a DataFrame with the following structure:

  id    year  name        genres        
  238   2022  Adventure   [{"revenue": 1463, "name": "culture clash", 'runtime': 150, 'vote_average': 7}]
 239    2020  Comedy   []

But what I need is this structure

 id    year  name           revenue  name           runtime vote_average
 238   2022  Adventure      1463     culture clash  150       7
 239    2020  Comedy

Please note that i sometimes have empty array in column genres

i used this code

(df.join(pd.json_normalize(df['genres'], record_path='genres'), 
               lsuffix='', rsuffix='_genres')

but it got me an error TypeError: list indices must be integers or slices, not str

Any solutions?

CodePudding user response:

You could try: In case the genres columns contains strings do

df["genres"] = df["genres"].map(eval)

fist. Then:

df = pd.concat(
    [df[["id", "year"]], pd.DataFrame(obj[0] if obj else {} for obj in df["genres"])],
    axis="columns"
)

Result for the sample:

    id  year  revenue           name  runtime  vote_average
0  238  2022   1463.0  culture clash    150.0           7.0
1  239  2020      NaN            NaN      NaN           NaN

If you don't want use eval you could try this

df["genres"] = pd.read_json("["   ", ".join(df["genres"])   "]")
df = pd.concat(
    [df[["id", "year"]], pd.json_normalize(df["genres"])], axis="columns"
)

instead.

CodePudding user response:

Personally speaking, I would use iterrows in your case:

for index, row in df.iterrows():
  value = row["genres"]
  if len(value) == 1:
    for key, keyValue in value[0].items():
      df.loc[index, key] = keyValue 
df.drop(columns=["genres"], inplace=True)
df

Output

id year revenue name runtime vote_average
0 238 2022 1463 culture clash 150 7
1 239 2020 nan nan nan nan
  • Related