I have a specific text in one column that I want to extract, and wondering if I could extract a specific sequence from the rows in that column and add them to a new column.
From this:
|studios|
|-------|
|[{'mal_id': 14, 'name': 'Sunrise'}]|
|[{'mal_id': 34, 'name': 'Hal Film Maker'}]|
|[{'mal_id': 18, 'name': 'Toei Animation'}]|
|[]|
|[{'mal_id': 455, 'name': 'Palm Studio'}]|
To this:
|studios|
|-------|
|Sunrise|
|Hal Film Maker|
|Toei Animation|
|[]|
|Palm Studio|
CodePudding user response:
You can use .str
to access indexes/keys from the lists/dicts of items in a column, and use a combination of pipe
and where
to fallback to the original values where the result from .str
returns NaN:
df['studios'] = df['studios'].str[0].str['name'].pipe(lambda x: x.where(x.notna(), df['studios']))
Note: you may need to convert the items in df['studio']
to actual objects, in case they're just strings that look like objects. To do that, run this before you run the above code:
import ast
df['studios'] = df['studios'].apply(ast.literal_eval)