Home > Back-end >  python - json loads, how to clear line breaks from outside of key value pairs but keep them within v
python - json loads, how to clear line breaks from outside of key value pairs but keep them within v

Time:01-10

We receive AWS notifications to an automated mailbox in JSON format, I have a python script that should process these, however when im loading the content/body of these emails into JSON it is erroring with

json.decoder.JSONDecodeError: Extra data: line 5 column 3007 (char 3159)

When I looked at the content I can see it is full of line breaks where it seems the json has been formatted for readability in the body of the message. I need to maintain the line breaks in the values of the data but outside of the values they need stripping so I can load the content into readable JSON

here is a sample of the content, does anyone have any ideas?

Thanks

'{\r\n  "Type" : "Notification",\r\n  "MessageId" : "afad72049c0cb1",\r\n  "TopicArn" : "arn:aws:sns:eu-west-1:793738:aws-health",\r\n  "Message" : "{\\"version\\":\\"0\\",\\"id\\":\\"3f059336-bdd1-e27b423d5\\",\\"detail-type\\":\\"AWS Health Event\\",\\"source\\":\\"aws.health\\",\\"account\\":\\"7954138\\",\\"time\\":\\"2022-10-19T08:55:00Z\\",\\"region\\":\\"eu-west-1\\",\\"resources\\":[\\"docker/b\\",\\"master/phub\\"],\\"detail\\":{\\"eventArn\\":\\"arn:aws:health:eu-west-1::event/ECS/AWS_ECS_SECURITY_NOTIFICATION/AWS_ECS_SECURITY_NOTIFICATION_3986a573dbe33a823860ad3272f72e\\",\\"service\\":\\"ECS\\",\\"eventTypeCode\\":\\"AWS_ECS_SECURITY_NOTIFICATION\\",\\"eventTypeCategory\\":\\"accountNotification\\",\\"startTime\\":\\"Wed, 19 Oct 2022 08:55:00 GMT\\",\\"eventDescription\\":[{\\"language\\":\\"en_US\\",\\"latestDescription\\":\\"A software update has been deployed to Fargate which includes CVE patches or other critical patches. No action is required on your part. All new tasks launched automatically uses the latest software version. For running tasks, your tasks need to be restarted in order for these updates to apply. Your tasks running as part of the following ECS Services will be automatically updated beginning October 31, 2022.\\\\n\\\\nA list of your affected resource(s) can be found in the \'Affected resources\' tab in the \\\\\\"Cluster | Service\\\\\\" format.\\\\n\\\\nAfter October 31, 2022, Fargate will begin gradually restarting these tasks. Typically, services should see little to no interruption during the update and no action is required. Data your task has stored on local ephemeral storage will no longer be available, similar to a scaling down event. If you would like to control the timing of this restart you can update the service before October 31, 2022, by running the update-service command from the ECS command-line interface specifying force-new-deployment. For example:\\\\n\\\\n$ aws ecs update-service --service service_name \\\\\\\\\\\\n--cluster cluster_name --force-new-deployment\\\\n\\\\nFor further details on Fargate\'s update process, please refer to the ECS developer guide [1].\\\\n\\\\nIf you have any questions or concerns, please contact AWS Support [2].\\\\n\\\\n[1] https://eur02.safelinks.protection.outlook.com/?url=https://docs.aws.amazon.com/AmazonECS/latest/userguide/task-maintenance.html//n&data=05|01|[email protected]|18ecb8a6d7454302640808dab1df762e|9168a104f43a47ffa70848b8545e1691|0|0|638017870565849523|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|3000|||&sdata=8GnV6bDohXEG8AYo4mOwSY9dLuqRLknLuXnaelVS/nI=&reserved=0[2] https://eur02.safelinks.protection.outlook.com/?url=https://aws.amazon.com/support//762e|9168a104f43a47ffa70848b8545e1691|0|0|638017870565849523|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|3000|||&sdata=Ll8kJRsNgFw46znXWhmH9Ph+u2zBchweMzqq1feqjQk=&reserved=0"}],\\"affectedEntities\\":[{\\"entityValue\\":\\"docker/rcure-hub\\"},{\\"entityValue\\":\\"master/rcure-hub\\"}]}}",\r\n  "Timestamp" : "2022-10-19T14:37:30.976Z",\r\n  "SignatureVersion" : "1",\r\n  "Signature" : "taT/Hxpaywf/WurHI/hs0wmZxA0hqhjDX1tFk9KmmY2Vyj6zXTzF6k78XoSiLvfGK7pOZCL oruqZKBFyRy8SvKvDMa0ZT6ekKj9uAEwmpAItDZfkNvJM1hmSSNEV 8SpKRBU0GSQ8v4UkXMHQUNqGIURKRJpoJEORy8Yd7/Qsw8cNlZhrEAGzj/L7O6Fo84cUsjBASqDyjOwAnUmys0CVdxrEUYPoc6m4tPfazrTkw GSteBQ904kSvSbEL7AR61n7TK4nqv6t3xJ7HcEiP6vO0m7mj3rhOjIgeFtQrPbFONUHdWt3hP1OD9Fa84tVEwPDHJiFm w0 aJu WhEUTg==",\r\n  "SigningCertURL" : "https://eur02.safelinks.protection.outlook.com/?url=https://sns.eu-west-1.amazonaws.com/SimpleNotificationService-56e67fcb41f6fec09b0196692625d385.pem&data=05|01|[email protected]|18ecb8a6d7454302640808dab1df762e|9168a104f43a47ffa70848b8545e1691|0|0|638017870565849523|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|3000|||&sdata=8+Wv/P64OBM3lk0CXurmLbYlIZCxHoR+eWCbWZUoUQw=&reserved=0",\r\n  "UnsubscribeURL" : "https://eur02.safelinks.protection.outlook.com/?url=https://sns.eu-west-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:eu-west-1:793726854138:aws-health:6de24e4d-ae74-4aaa-bf78-36b6e95c335f&data=05|01|[email protected]|18ecb8a6d7454302640808dab1df762e|9168a104f43a47ffa70848b8545e1691|0|0|6V/vQ6tB2outb/rNzKRsJMJ3DE=&reserved=0"\r\n}\r\n\r\n'

CodePudding user response:

The line breaks are not the problem, those line breaks don't invalidate the JSON format, so the json module handles them just fine. The problem is that the string you've provided as an example is not valid JSON. So, if I open up a REPL and do s = <string you provided>:

>>> import json
>>> json.loads(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jarrivillaga/miniconda3/envs/py311/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jarrivillaga/miniconda3/envs/py311/lib/python3.11/json/decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 5 column 2828 (char 2952)

Now, going off of the error message, let's look at the 5th line (it's the big one):

>>> s.splitlines()[4]
'  "Message" : "{\\"version\\":\\"0\\",\\"id\\":\\"3f059336-bdd1-e27b423d5\\",\\"detail-type\\":\\"AWS Health Event\\",\\"source\\":\\"aws.health\\",\\"account\\":\\"7954138\\",\\"time\\":\\"2022-10-19T08:55:00Z\\",\\"region\\":\\"eu-west-1\\",\\"resources\\":[\\"docker/b\\",\\"master/phub\\"],\\"detail\\":{\\"eventArn\\":\\"arn:aws:health:eu-west-1::event/ECS/AWS_ECS_SECURITY_NOTIFICATION/AWS_ECS_SECURITY_NOTIFICATION_3986a573dbe33a823860ad3272f72e\\",\\"service\\":\\"ECS\\",\\"eventTypeCode\\":\\"AWS_ECS_SECURITY_NOTIFICATION\\",\\"eventTypeCategory\\":\\"accountNotification\\",\\"startTime\\":\\"Wed, 19 Oct 2022 08:55:00 GMT\\",\\"eventDescription\\":[{\\"language\\":\\"en_US\\",\\"latestDescription\\":\\"A software update has been deployed to Fargate which includes CVE patches or other critical patches. No action is required on your part. All new tasks launched automatically uses the latest software version. For running tasks, your tasks need to be restarted in order for these updates to apply. Your tasks running as part of the following ECS Services will be automatically updated beginning October 31, 2022.\\\\n\\\\nA list of your affected resource(s) can be found in the \'Affected resources\' tab in the \\\\\\"Cluster | Service\\\\\\" format.\\\\n\\\\nAfter October 31, 2022, Fargate will begin gradually restarting these tasks. Typically, services should see little to no interruption during the update and no action is required. Data your task has stored on local ephemeral storage will no longer be available, similar to a scaling down event. If you would like to control the timing of this restart you can update the service before October 31, 2022, by running the update-service command from the ECS command-line interface specifying force-new-deployment. For example:\\\\n\\\\n$ aws ecs update-service --service service_name \\\\\\\\\\\\n--cluster cluster_name --force-new-deployment\\\\n\\\\nFor further details on Fargate\'s update process, please refer to the ECS developer guide [1].\\\\n\\\\nIf you have any questions or concerns, please contact AWS Support [2].\\\\n\\\\n[1] https://eur02.safelinks.protection.outlook.com/?url=https://docs.aws.amazon.com/AmazonECS/latest/userguide/task-maintenance.html//n&amp;data=05|01|[email protected]|18ecb8a6d7454302640808dab1df762e|9168a104f43a47ffa70848b8545e1691|0|0|638017870565849523|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|3000|||&amp;sdata=8GnV6bDohXEG8AYo4mOwSY9dLuqRLknLuXnaelVS/nI=&amp;reserved=0[2] https://eur02.safelinks.protection.outlook.com/?url=https://aws.amazon.com/support//762e|9168a104f43a47ffa70848b8545e1691|0|0|638017870565849523|Unknown|TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0=|3000|||&amp;sdata=Ll8kJRsNgFw46znXWhmH9Ph+u2zBchweMzqq1feqjQk=&amp;reserved=0"}],\\"affectedEntities\\":[{\\"entityValue\\":\\"docker/rcure-hub\\"},{\\"entityValue\\":\\"master/rcure-hub\\"}]}}",'

Now, let's take a closer look at the problematic portion, column 2828 (char 2952):

>>> s[2952-1]
'}'
>>> s.splitlines()[4][2828 - 10: 2828  10]
'erved=0"}],\\"affecte'

So, this string has an un-escaped quote, ending the JSON string. This is the problem, not the line breaks. Are you sure this is exactly what you are getting from AWS? If so, then this is a problem on their end I'd say. But what exactly do you mean by "We receive AWS notifications to an automated mailbox in JSON format"?

CodePudding user response:

If you want to load this data in JSON format try doing

dumps = json.dumps('{\r\n  "Type" :"Notification",\r\n...........;reserved=0"\r\n}\r\n\r\n')
data = json.loads(dumps)
  • Related