I have a data structure like this
name | targets | imp |
---|---|---|
Bob | {'codes':[3,4,6,199], 'region':'us', 'meta':''} | 200 |
Diana | {'codes':[3,33,199], 'region':'us', 'meta':''} | 100 |
I am trying to make the final results one more column of extracted codes
in targets
, like this
name | targets | imp | targets.code |
---|---|---|---|
Bob | {'codes':[3,4,6,199], 'region':'us', 'meta':''} | 200 | [3,4,6,199] |
Diana | {'codes':[3,33,199], 'region':'us', 'meta':''} | 100 | [3,33,199] |
I tried doing
df['targets.code'] = df['targets'].apply(lambda x: x['codes'])
But it shows specifying on that line
[2022-01-14 19:53:33,660] {{taskinstance.py:1150}} ERROR - 'NoneType' object is not subscriptable
I really tried digging into a lot of posts but didn't find a solution. What am I doing wrong?
CodePudding user response:
Looks like you have bad data (empty fields) in your source under "targets" column, because the following works:
df = pd.DataFrame([{'name': 'Bob', 'targets': {'codes':[3,4,6,199], 'region':'us', 'meta':''}}])
print(df)
# name targets
# 0 Bob {'codes': [3, 4, 6, 199], 'region': 'us', 'met...
df['targets.code'] = df['targets'].apply(lambda x: x['codes'])
print(df)
# name targets targets.code
# 0 Bob {'codes': [3, 4, 6, 199], 'region': 'us', 'met... [3, 4, 6, 199]
CodePudding user response:
You can use pd.json_normalize
:
df['targets.code'] = pd.json_normalize(df['targets'])['codes']
print(df)
# Output
name targets imp targets.code
0 Bob {'codes': [3, 4, 6, 199], 'region': 'us', 'met... 200 [3, 4, 6, 199]
1 Diana {'codes': [3, 33, 199], 'region': 'us', 'meta'... 100 [3, 33, 199]
You can also use a comprehension:
df['targets.code'] = [x['codes'] if x else [] for x in df['targets']]
print(df)
# Output
name targets imp targets.code
0 Bob {'codes': [3, 4, 6, 199], 'region': 'us', 'met... 200 [3, 4, 6, 199]
1 Diana {'codes': [3, 33, 199], 'region': 'us', 'meta'... 100 [3, 33, 199]
2 Test None 50 []