I have the following list of nested dictionaries:
raw_data = [
{
"type": "message",
"subtype": "bot_message",
"text": "This content can't be displayed.",
"timestamp": "1650905606.755969",
"username": "admin",
"bot_id": "BPD4K3SJW",
"blocks": [
{
"type": "section",
"block_id": "BJNTn",
"text": {
"type": "mrkdwn",
"text": "You have a new message.",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "WPn/l",
"text": {
"type": "mrkdwn",
"text": "*Heard By*\nFriend",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "5yp",
"text": {
"type": "mrkdwn",
"text": "*Which Direction? *\nNorth",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "fKEpF",
"text": {
"type": "mrkdwn",
"text": "*Which Destination*\nNew York",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "qjAH",
"text": {
"type": "mrkdwn",
"text": "*New Customer:*\Yes",
"verbatim": False,
},
},
# problem code chunk below
{
"type": "actions",
"block_id": "yt4",
"elements": [
{
"type": "button",
"action_id": " bc",
"text": {
"type": "plain_text",
"bar": "View results",
"emoji": True,
},
"url": "www.example.com/results",
}
],
},
# problem code chunk above
{
"type": "section",
"block_id": "IBr",
"text": {"type": "mrkdwn", "text": " ", "verbatim": False},
},
],
},
{
"type": "message",
"subtype": "bot_message",
"text": "This content can't be displayed.",
"timestamp": "1650899428.077709",
"username": "admin",
"bot_id": "BPD4K3SJW",
"blocks": [
{
"type": "section",
"block_id": "Smd",
"text": {
"type": "mrkdwn",
"text": "You have a new message.",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "6YaLt",
"text": {
"type": "mrkdwn",
"text": "*Heard By*\nOnline Search",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "w3o",
"text": {
"type": "mrkdwn",
"text": "*Which Direction: *\nNorth",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "PTQ",
"text": {
"type": "mrkdwn",
"text": "*Which Destination? *\nMiami",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "JCfSP",
"text": {
"type": "mrkdwn",
"text": "*New Customer? *\nNo",
"verbatim": False,
},
},
# problem code chunk below
{
"type": "actions",
"block_id": "yt4",
"elements": [
{
"type": "button",
"action_id": " bc",
"text": {
"type": "plain_text",
"bar": "View results",
"emoji": True,
},
"url": "www.example.com/results",
}
],
},
# problem code chunk above
{
"type": "section",
"block_id": "RJOA",
"text": {"type": "mrkdwn", "text": " ", "verbatim": False},
},
],
},
]
My goal is to produce a Pandas dataframe that looks as follows:
heard_by direction destination new_customer
0 Friend North New York Yes
1 Online Search North Miami No
To do so, I use the following:
d_new = (pd.DataFrame([[re.sub(".*[*]\\W ", "", val['text']['text'])
for val in dat['blocks']] for dat in raw_data]).
drop([0, 5], axis = 1))
d_new.columns = ['heard_by', 'direction','destination', 'new_customer']
d_new
Unfortunately, this throws a Key Error:
KeyError: 'text'
However, this code does work, but only if we comment out the following chunks in the list above:
# {'type': 'actions',
# 'block_id': 'yt4',
# 'elements': [{'type': 'button',
# 'action_id': ' bc',
# 'text': {'type': 'plain_text', 'bar': 'View results', 'emoji': True},
# 'url': 'www.example.com/results'}]},
How do we adapt the code to handle this use case?
Thanks!
CodePudding user response:
The issue seems to be that the problem chunks of code don't have a 'text' key, as their 'text' keys seem to be in the array value for the 'elements' key in those blocks. You may create a function that checks for the existence of the 'elements' or 'text' key and return the correct value accordingly.
CodePudding user response:
Try only keeping the data where "text" is one of the keys:
>>> pd.DataFrame(data=[[re.sub(".*[*]\\W ", "", val['text']['text']) for val in dat['blocks'] if val.get('text')][1:5] for dat in raw_data],
columns=['heard_by', 'direction','destination', 'new_customer'])
heard_by direction destination new_customer
0 Friend North New York Yes
1 Online Search North Miami No
CodePudding user response:
Since you're not grabbing anything from the "problem chunks", just skip them entirely:
parsed = [[re.sub(".*[*]\\W ", "", val['text']['text']) for val in dat['blocks'] if val["type"] != "actions"] for dat in raw_data]
df_new = pd.DataFrame(parsed).drop([0, 5], axis=1)
d_new.columns = ['heard_by', 'direction','destination', 'new_customer']
Output:
heard_by direction destination new_customer
0 Friend North New York Yes
1 Online Search North Miami No
For what it's worth, when your comprehensions start getting this messy it's best to just write a standard for
loop, which is much easier to understand and debug:
parsed = []
for dat in raw_data:
new_row = []
for val in dat["blocks"]:
if val["type"] != "actions":
new_row.append(re.sub(".*[*]\\W ", "", val['text']['text'])
parsed.append(new_row)
As an aside, how and where did you get these data? They're awfully inconsistent in format:
*Heard By*
Friend
*Which Direction? *
North
*Which Destination*
New York
*New Customer:*\Yes # why is there a backslash here? Was it supposed to be '\n'?
*Heard By*
Online Search
*Which Direction: *
North
*Which Destination? *
Miami
*New Customer? *
No
Makes it very difficult to write a more elegant solution.