How to get a specific column from a list into a pandas dataframe-CodePudding

How do I get answerId into a separate column in a pandas dataframe?

    0       {'answerText': {'es': 'No'}, 'answerId': 'Q2A2'}
    1       {'answerText': {'es': 'No'}, 'answerId': 'Q2A2'}
    2      {'answerText': {'es': 'Sí'}, 'answerId': 'Q2A1, 'freetextAnswer': 'Parancetamol 1g.',
 'includeFreeText': True}
    3       {'answerText': {'es': 'No'}, 'answerId': 'Q2A2'}
    4       {'answerText': {'es': 'No'}, 'answerId': 'Q2A2'}

as a df now looks like this:

responses1_answer
0   {'answerText': {'es': 'No'}, 'answerId': 'Q2A2'}
1   {'answerText': {'es': 'No'}, 'answerId': 'Q2A2'}
2   {'answerText': {'es': 'Sí'}, 'answerId': 'Q2A1...
3   {'answerText': {'es': 'No'}, 'answerId': 'Q2A2'}
4   {'answerText': {'es': 'No'}, 'answerId': 'Q2A2'}

I tried with json_normalise but I get the answers Q2A2 and so on as a column instead. Any help would be highly appreciated!

Instead the output I want is a dataframe where answerId is in a separate column like this:

answerId  
Q2A2
Q2A2
Q2A1

I also tried:

variables = df[0].keys()

df1 = pd.DataFrame([[getattr(i,j) for j in variables] for i in df], columns = variables)

but I get: AttributeError: 'dict' object has no attribute 'answerText'

CodePudding user response：

Assuming lst is your list of dicts, you can do:

pd.DataFrame(data=[d['answerId'] for d in lst], columns=['answerId'])

CodePudding user response：

Your comment:

it is a pandas series. I will post it in my original question

Then you can use the apply() function. Something like this should works. (However i did not try it out as i don't have your original data)

new_series = original_series.apply(lambda d: d['answerId'])

CodePudding user response：

Get values by key answerId with str, it return NaN if no match:

print (df)
                                  responses1_answer
0  {'answerText': {'es': 'No'}, 'answerId': 'Q2A2'}
1  {'answerText': {'es': 'No'}, 'answerId': 'Q2A2'}
2  {'answerText': {'es': 'Sí'}, 'answerId': 'Q2A1'}
3  {'answerText': {'es': 'No'}, 'answerId': 'Q2A2'}
4  {'answerText': {'es': 'No'}, 'answerId': 'Q2A2'}

s = df['responses1_answer'].str['answerId']
print (s)
0    Q2A2
1    Q2A2
2    Q2A1
3    Q2A2
4    Q2A2
Name: responses1_answer, dtype: object

df1 = pd.json_normalize(df['responses1_answer'])
print (df1)
  answerId answerText.es
0     Q2A2            No
1     Q2A2            No
2     Q2A1            Sí
3     Q2A2            No
4     Q2A2            No