Hi I would like to add a bit to the code
where it splits the notes
into two parts configurations
and parameters
. The configurations
resides inside the []
of the notes
and is to the left of the curly brackets ()
. The parameters
however resides inside of the curly brackets ()
. For the notes
that have parameters
I want to split them up using a commas. If a parameter
has one or more configurations
a list that contains all elements
of the config
is separated by commas [element 1, element 2]
. For parameters
without any configs
create and empty list []
. If a note has no parameters then both the parameter
and configuration
section will be of type None
. I want to achieve the results from the Expected Outputs below.
Code:
import re
import pandas as pd
lines = ['yes hello there', 'move on to the next command if the previous command was successful.',
"$$n:describes the '&&' character in the RUN command.",
'k',
'$$n[t(a1), mfc(s,expand,rr), np(), k]: description']
notes = []
parameters= []
configurations= []
for i, line in enumerate(lines):
if re.search(r'\$\$.*\:', line):
notes.append(re.sub(r'\$\$.*\:', '', line).strip())
df = pd.DataFrame({
'Note': notes,
'Parameters': parameters,
'configurations': configurations
})
Expected Output:
---- ------------------------------------------------ -------------- --------------------------
| | Note | Parameters | Configurations |
|---- ------------------------------------------------ -------------- --------------------------|
| 0 | describes the && character in the RUN command. | None | None |
| 1 | description | t,mfc,np,k | [a1],[s,expand,rr],[],[] |
---- ------------------------------------------------ -------------- --------------------------
CodePudding user response:
This will create sublists:
notes = []
parameters= []
configurations= []
for i, line in enumerate(lines):
expr = re.search(r'\$\$[^:[]*?(?:\[([^:\]]*)\])?\:', line)
if expr:
notes.append(re.sub(r'\$\$.*?\:', '', line).strip())
if expr[1]:
names = []
confs = []
for part in re.findall(r'([^(,] )(?:\(([^)]*)\))?', expr[1]):
names.append(part[0])
confs.append(part[1].split(",") if part[1] else [])
parameters.append(names)
configurations.append(confs)
else:
parameters.append(None)
configurations.append(None)
If you need those values to be strings instead of sublists, then:
notes = []
parameters= []
configurations= []
for i, line in enumerate(lines):
expr = re.search(r'\$\$[^:[]*?(?:\[([^:\]]*)\])?\:', line)
if expr:
notes.append(re.sub(r'\$\$.*?\:', '', line).strip())
if expr[1]:
names = []
confs = []
for part in re.findall(r'([^\s(,] )(?:\(([^)]*)\))?', expr[1]):
names.append(part[0])
confs.append(f"[{part[1]}]")
parameters.append(",".join(names))
configurations.append(",".join(confs))
else:
parameters.append(None)
configurations.append(None)