How to replace spaces with in square brackets-CodePudding

I have been looking for a solution for this for days now, I want to replace all spaces only when they occur inside square brackets, and replace them with a double quote, comma, double quote. ---> ","

The solution can be any where as long as it works, but need to say that its a huge file.

An example:

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.3"],

Will be:

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

Preferable solutions since I am working with these tools: Visual Studio OR Sed.

Thank you

CodePudding user response：

Python Solution

Replace the sample.txt file with your file and it will replace the spaces with in the [] with "," and write it back to the file.

process.py

import re
import os
with open('sample.txt',"r ") as f:
    contents = f.read()
    reg1 = re.compile('(?<=\[\")(.*)(?=\"\])', re.MULTILINE)
    lines = reg1.findall(contents)
    for line in lines:
        newLine = re.sub(r'\s','","',line)
        contents = contents.replace(line,newLine)
    f.seek(0, os.SEEK_SET)
    f.write(contents)

sample.txt

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.3"],
"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.1"],

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.2"],

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.4"],

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.5"],

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.6"],

Output

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.1"],

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.2"],

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.4"],

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.5"],

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.6"],

CodePudding user response：

Using sed

$ sed ':a;s/\(\["[^ ]*\) /\1","/;ta' input_file
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

If the loop does not work for your use case but the pattern will always be the same, you can also try

$ sed 's/\(\[[^ ]*\) /\1","/g' input_file
"NIST 800-171A": ["3.11.2[a]","3.11.2 3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

CodePudding user response：

To replace any spaces after the first [ and before the last ]... search for:

(^[^\[] |\][^\]] $)|\s

and replace with $1 (what's captured inside the first capture group). See this demo at regex101.

The idea is to use a variation of The Trick. It captures the first and last part to group 1 OR matches any whitespaces. Because the spaces are not captured, they get replaced by empty matches.

CodePudding user response：

With GNU awk for the 3rd arg to match() and gensub() we can get the output you show from the input you show with:

$ awk 'match($0,/([^[]*)(.*)/,a){ $0=a[1] gensub(/ /,"\",\"","g",a[2]) } 1' file
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

or with any awk:

$ awk 's=index($0,"["){ tgt=substr($0,s); gsub(/ /,"\",\"",tgt); $0=substr($0,1,s-1) tgt } 1' file
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

CodePudding user response：

no regex capture group or gensub() needed at all :

{m,g}awk '$!NF = substr($!(NF=NF),!_, __=index($_, "[")) substr(_,
            __ = substr($_,  __), gsub(" ","\",\"", __))__' OFS=']\",\"' FS='[]][ \t] ' 

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],