Suppose I have a text file like the one given below.
EVENTS 16623232889 {"log": "Hello I am someone", "stream": "a", "cluster-name": "432"} 3232
EVENTS 16623232890 {"log": "I am doing something.", "stream": "b", "cluster-name": "432"} 2321
EVENTS 16623232891 {"log": "bbye", "stream": "c", "cluster-name": "432"} 231231
EVENTS 16623232892 {"log": "bbyee", "stream": "d", "cluster-name": "432"} 23123212
I want to just get the words which are present in the log. For example the output should be Hello I am someone I am doing something. bbye bbyee
I do know that I can remove the event and event number using the code given below but not sure how to go ahead with it now
file_name = "a.json"
with open(file_name) as f1:
lines = f1.readlines()
for i, line in enumerate(lines):
lines[i] = line.split(" ", 2)[2]
lines = str(lines)
for i, line in enumerate(lines):
lines[i] = line.split(" ", 2)[2]
lines = str(lines)
CodePudding user response:
You can parse each line as a json/dictionary after removing Event and Event Number.
import json
file_name = "a.json"
with open(file_name) as f1:
lines = f1.readlines()
# Removing Event and Event Number, then parsing line as a json
lines = [json.loads(" ".join(line.split(" ")[2:])) for line in lines]
print ([line["log"] for line in lines])
Output:
['Hello I am someone', 'I am doing something.', 'bbye', 'bbyee']
EDIT: Removing the last field, along with the first two
import json
file_name = "a.json"
with open(file_name) as f1:
lines = f1.readlines()
# Removing Event, Event Number and the Number at the end, then parsing line as a json
lines = [json.loads(" ".join(line.split(" ")[2:-1])) for line in lines]
print ([line["log"] for line in lines])
CodePudding user response:
- Iterate over the lines
- split each line with
maxsplit
of two so that you get the list (in string). - give that to
json.loads
in order to get dictionary back. - get the
"log"
key from it. - join them together with
" ".join
from json import loads
text = """\
EVENTS 16623232889 {"log": "Hello I am someone", "stream": "a", "cluster-name": "432"}
EVENTS 16623232890 {"log": "I am doing something.", "stream": "b", "cluster-name": "432"}
EVENTS 16623232891 {"log": "bbye", "stream": "c", "cluster-name": "432"}
EVENTS 16623232892 {"log": "bbyee", "stream": "d", "cluster-name": "432"}
"""
print(" ".join(loads(line.split(maxsplit=2)[2])["log"] for line in text.splitlines()))
After edit:
There are couple of ways you can do, I chose to go with regex:
import re
text = """\
EVENTS 16623232889 {"log": "Hello I am someone", "stream": "a", "cluster-name": "432"} 3232
EVENTS 16623232890 {"log": "I am doing something.", "stream": "b", "cluster-name": "432"} 2321
EVENTS 16623232891 {"log": "bbye", "stream": "c", "cluster-name": "432"} 231231
EVENTS 16623232892 {"log": "bbyee", "stream": "d", "cluster-name": "432"} 23123212
"""
pattern = r'"log": *"(.*?)"'
print(" ".join(re.search(pattern, line).group(1) for line in text.splitlines()))
output:
Hello I am someone I am doing something. bbye bbyee
"log": "(.*?)"
pattern searches for the content of the "log"
key directly. I captured the content in a group so that I can retrieve it later with group(1)
. Note that pattern should be non-greedy so that it stops after finding the first "
.