Given a json file, I would like to split a value into parts based on seeing [..] and |..|. If they are seen at least 2 times. By split, I mean taking the one line from time and separating the string if it sees 2 or more [..] or |..| of these characters.
[
{
"Action":"Walk",
"Time":"1 hour [c] 2 hour [dog] 1 hour [p]",
},
{
"Action":"Pet",
"Time":"1 hour [cat] 2 hour |d|",
},
{
"Action":"F",
"Time":"1 hour [cat]",
},
]
Desired Result
[
{
"Action":"Walk",
"Time":[
"1 hour [c]",
"2 hour [dog]",
"1 hour [p]" ],
},
{
"Action":"Pet",
"Time":[
"1 hour [cat]",
"2 hour |d|"
],
},
{
"Action":"F",
"Time":"1 hour [cat]",
},
]
Here is my code:
with open(filenames,"r") as f:
data=json.load(f)
CodePudding user response:
A regex can solve that easily, but is quite hard to read. I recommend you checking out the cheatsheet on regexr and possibly look into the regex documentation of re.findall
But here you go - that code should do what you asked for:
import re
with open(filenames, "rw") as f:
data=json.load(f)
for action in data:
action["Time"] = [
time_part.strip()
for time_part
in re.findall(r".*?(?:(?:\[.*?\])|(?:\|.*?\|))", action["Time"])
]
if len(action["Time"]) == 1: # when only a single action was done don't store it as an array
action["Time"] = action["Time"][0]
json.dump(data, f)
The regex when removing special character escape (like \[
to [
), non-capturing groups (these (:?
) and lazy matching (regex matches as much chars as possible by default. Using *?
is called lazy matching) - just so it's easier to understand:
This is basically the logic used above: .*([.*] OR |.*|)