Home > Software engineering >  Using split to get particular words in python
Using split to get particular words in python

Time:09-10

Suppose I have a text file like the one given below.

EVENTS 16623232889 {"log": "Hello I am someone", "stream": "a", "cluster-name": "432"} 3232
EVENTS 16623232890 {"log": "I am doing something.", "stream": "b", "cluster-name": "432"} 2321
EVENTS 16623232891 {"log": "bbye", "stream": "c", "cluster-name": "432"} 231231
EVENTS 16623232892 {"log": "bbyee", "stream": "d", "cluster-name": "432"} 23123212

I want to just get the words which are present in the log. For example the output should be Hello I am someone I am doing something. bbye bbyee

I do know that I can remove the event and event number using the code given below but not sure how to go ahead with it now

file_name = "a.json"
with open(file_name) as f1:
    lines = f1.readlines()

for i, line in enumerate(lines):
    lines[i] = line.split(" ", 2)[2]
lines = str(lines)
for i, line in enumerate(lines):
    lines[i] = line.split(" ", 2)[2]
lines = str(lines)

CodePudding user response:

You can parse each line as a json/dictionary after removing Event and Event Number.

import json
file_name = "a.json"
with open(file_name) as f1:
    lines = f1.readlines()
    # Removing Event and Event Number, then parsing line as a json
    lines = [json.loads(" ".join(line.split(" ")[2:])) for line in lines]
    print ([line["log"] for line in lines])

Output:

['Hello I am someone', 'I am doing something.', 'bbye', 'bbyee']

EDIT: Removing the last field, along with the first two

import json
file_name = "a.json"
with open(file_name) as f1:
    lines = f1.readlines()
    # Removing Event, Event Number and the Number at the end, then parsing line as a json
    lines = [json.loads(" ".join(line.split(" ")[2:-1])) for line in lines]
    print ([line["log"] for line in lines])

CodePudding user response:

  1. Iterate over the lines
  2. split each line with maxsplit of two so that you get the list (in string).
  3. give that to json.loads in order to get dictionary back.
  4. get the "log" key from it.
  5. join them together with " ".join
from json import loads

text = """\
EVENTS 16623232889 {"log": "Hello I am someone", "stream": "a", "cluster-name": "432"}
EVENTS 16623232890 {"log": "I am doing something.", "stream": "b", "cluster-name": "432"}
EVENTS 16623232891 {"log": "bbye", "stream": "c", "cluster-name": "432"}
EVENTS 16623232892 {"log": "bbyee", "stream": "d", "cluster-name": "432"}
"""

print(" ".join(loads(line.split(maxsplit=2)[2])["log"] for line in text.splitlines()))

After edit:

There are couple of ways you can do, I chose to go with regex:

import re

text = """\
EVENTS 16623232889 {"log": "Hello I am someone", "stream": "a", "cluster-name": "432"} 3232
EVENTS 16623232890 {"log": "I am doing something.", "stream": "b", "cluster-name": "432"} 2321
EVENTS 16623232891 {"log": "bbye", "stream": "c", "cluster-name": "432"} 231231
EVENTS 16623232892 {"log": "bbyee", "stream": "d", "cluster-name": "432"} 23123212
"""

pattern = r'"log": *"(.*?)"'

print(" ".join(re.search(pattern, line).group(1) for line in text.splitlines()))

output:

Hello I am someone I am doing something. bbye bbyee

"log": "(.*?)" pattern searches for the content of the "log" key directly. I captured the content in a group so that I can retrieve it later with group(1). Note that pattern should be non-greedy so that it stops after finding the first ".

  • Related