I have a file events.txt and there is multiple records having deleted events , how can we remove the delete events?
events.txt
file records are like below -
delete|109393509715446004
{"id": 109472787571426436, "created_at": "2022-12-07T14:09:27 00:00", "in_reply_to_id": null, "in_reply_to_account_id": null, "sensitive": false}
{"id": 109472787901758948, "created_at": "2022-12-07T14:09:37 00:00", "in_reply_to_id": null, "in_reply_to_account_id": null, "sensitive": false}
delete|109393512606515336
{"id": 109472787957427984, "created_at": "2022-12-07T14:09:38 00:00","in_reply_to_id": null, "in_reply_to_account_id": null, "sensitive": false}
USed below approach to read the file data and transform :
with open('events.txt',encoding='utf-8') as f:
for line in f:
event = line.replace('update|', '').replace('status.update|', '').replace('status.','')
print(type(event))
print(event)
type of event - <class 'str'>
Please suggest how can we remove or skip the delete
event rows while processing in loop above.
CodePudding user response:
It looks like the lines in the file you care about are valid JSON, while the lines you want to ignore are not. If true, and assuming there is no possibility of a JSON decode error with your valid entries, then you could leverage that difference like this:
import json
with open("temp.txt") as file:
for line in file:
try:
d = json.loads(line)
print(d)
except json.JSONDecodeError:
pass
Output:
{'id': 109472787571426436, 'created_at': '2022-12-07T14:09:27 00:00', 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False}
{'id': 109472787901758948, 'created_at': '2022-12-07T14:09:37 00:00', 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False}
{'id': 109472787957427984, 'created_at': '2022-12-07T14:09:38 00:00', 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False}