I have a lot of text files, which all share the same structure (I tidied them up a bit), like so:
Annoying
------------------------
you are annoying me so much
you're incredibly annoying
I find you annoying
you are annoying
you're so annoying
how annoying you are
you annoy me
you are annoying me
you are irritating
you are such annoying
you're too annoying
you are very annoying
<Response>
Was going to say the same about you,
Ill try to fix that
Ok thanks for the feedback.
I need to convert them into a JSON with following structure:
{"tag": "Annoying",
"patterns": ["I find you annoying", "you are irritating" ...],
"responses": ["Ill try to fix that", "Ok thanks for the feedback." ...],
}
So the first line is always the tag, all lines till <Response>
are patterns
and everything after that is responses
I already managed to get everything into a JSON format, like so:
{"Annoying":{"0":"------------------------","1":"you are annoying me so much",
"2":"you're incredibly annoying",
"3":"I find you annoying",
"4":"you are annoying",
"5":"you're so annoying"
}}
That isn't the right format, I think the steps from here are:
- Taking the input from the file into a data frame with following structure:
tag | patterns | responses |
---|---|---|
Annoying | ... | .... |
- | ... | ... |
- Converting the data frame into a json with correct structure.
However I am completely lost on how to achieve this. I guess it should work something like that:
- When reading file input always put first line as
tag
- All following lines put into
patterns
- Check all lines that are read for
<Response>
as content, and if so switch column toresponses
Any help appreciated!
CodePudding user response:
Here is a solution using only raw python:
txt = """
Annoying
------------------------
you are annoying me so much
you're incredibly annoying
I find you annoying
you are annoying
you're so annoying
how annoying you are
you annoy me
you are annoying me
you are irritating
you are such annoying
you're too annoying
you are very annoying
<Response>
Was going to say the same about you,
Ill try to fix that
Ok thanks for the feedback.
""".strip()
raw_patterns, raw_responses = txt.split("<Response>")
# split in tag and actual pattern content
tag, raw_patterns2 = raw_patterns.split("\n------------------------")
patterns = raw_patterns2.strip().split("\n")
responses = raw_responses.strip().split("\n")
res = {
"tag" : tag,
"patterns": patterns,
"responses": responses
}
CodePudding user response:
Here is another solution with the use of for loop and enumerate function.
dictionary = {}
tag = ""
patterns = ""
responses = ""
#calculate the line number of <Response>
with open("yourFile.txt","r") as data:
for num, line in enumerate(data, 1):
if "<Response>" in line:
response_line = num
# parse the file and save the lines into variables
with open("yourFile.txt","r") as data:
for num, line in enumerate(data, 1):
if num == 1:
tag = line.strip("\n")
elif num < response_line:
patterns = line.replace("\n", " ")
elif num > response_line:
responses = line.replace("\n", " ")
#construct the dicitionary from variables
dictionary["tag"] = tag
dictionary["patterns"] = patterns
dictionary["responses"] = responses
print(dictionary)