Home > OS >  Manipulating and grouping strings within a for loop
Manipulating and grouping strings within a for loop

Time:09-04

Hi I would like to add a bit to the code where it splits the notes into two parts configurations and parameters. The configurations resides inside the [] of the notes and is to the left of the curly brackets (). The parameters however resides inside of the curly brackets (). For the notes that have parameters I want to split them up using a commas. If a parameter has one or more configurations a list that contains all elements of the config is separated by commas [element 1, element 2]. For parameters without any configs create and empty list []. If a note has no parameters then both the parameter and configuration section will be of type None. I want to achieve the results from the Expected Outputs below.

Code:

import re
import pandas as pd

lines = ['yes hello there', 'move on to the next command if the previous command was successful.',
         "$$n:describes the '&&' character in the RUN command.",
         'k', 
         '$$n[t(a1), mfc(s,expand,rr), np(), k]: description']

notes = []
parameters= []
configurations= []
for i, line in enumerate(lines):
    if re.search(r'\$\$.*\:', line):
        notes.append(re.sub(r'\$\$.*\:', '', line).strip())
        
df = pd.DataFrame({
    'Note': notes,
    'Parameters': parameters,
    'configurations': configurations
})

Expected Output:

 ---- ------------------------------------------------ -------------- -------------------------- 
|    | Note                                           | Parameters   | Configurations           |
|---- ------------------------------------------------ -------------- --------------------------|
|  0 | describes the && character in the RUN command. | None         | None                     |
|  1 | description                                    | t,mfc,np,k   | [a1],[s,expand,rr],[],[] |
 ---- ------------------------------------------------ -------------- -------------------------- 

CodePudding user response:

This will create sublists:

notes = []
parameters= []
configurations= []
for i, line in enumerate(lines):
    expr = re.search(r'\$\$[^:[]*?(?:\[([^:\]]*)\])?\:', line)
    if expr:
        notes.append(re.sub(r'\$\$.*?\:', '', line).strip())
        if expr[1]:
            names = []
            confs = []
            for part in re.findall(r'([^(,] )(?:\(([^)]*)\))?', expr[1]):
                names.append(part[0])
                confs.append(part[1].split(",") if part[1] else [])
            parameters.append(names)
            configurations.append(confs)
        else:
            parameters.append(None)
            configurations.append(None)

If you need those values to be strings instead of sublists, then:

notes = []
parameters= []
configurations= []
for i, line in enumerate(lines):
    expr = re.search(r'\$\$[^:[]*?(?:\[([^:\]]*)\])?\:', line)
    if expr:
        notes.append(re.sub(r'\$\$.*?\:', '', line).strip())
        if expr[1]:
            names = []
            confs = []
            for part in re.findall(r'([^\s(,] )(?:\(([^)]*)\))?', expr[1]):
                names.append(part[0])
                confs.append(f"[{part[1]}]")
            parameters.append(",".join(names))
            configurations.append(",".join(confs))
        else:
            parameters.append(None)
            configurations.append(None)
  • Related