Home > OS >  How to replace spaces with in square brackets
How to replace spaces with in square brackets

Time:07-06

I have been looking for a solution for this for days now, I want to replace all spaces only when they occur inside square brackets, and replace them with a double quote, comma, double quote. ---> ","

The solution can be any where as long as it works, but need to say that its a huge file.

An example:

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.3"],

Will be:

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

Preferable solutions since I am working with these tools: Visual Studio OR Sed.

Thank you

CodePudding user response:

Python Solution

Replace the sample.txt file with your file and it will replace the spaces with in the [] with "," and write it back to the file.

process.py

import re
import os
with open('sample.txt',"r ") as f:
    contents = f.read()
    reg1 = re.compile('(?<=\[\")(.*)(?=\"\])', re.MULTILINE)
    lines = reg1.findall(contents)
    for line in lines:
        newLine = re.sub(r'\s','","',line)
        contents = contents.replace(line,newLine)
    f.seek(0, os.SEEK_SET)
    f.write(contents)

sample.txt

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.3"],
"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.1"],

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.2"],

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.4"],

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.5"],

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.6"],

Output

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.1"],

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.2"],

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.4"],

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.5"],

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.6"],

CodePudding user response:

Using sed

$ sed ':a;s/\(\["[^ ]*\) /\1","/;ta' input_file
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

If the loop does not work for your use case but the pattern will always be the same, you can also try

$ sed 's/\(\[[^ ]*\) /\1","/g' input_file
"NIST 800-171A": ["3.11.2[a]","3.11.2 3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

CodePudding user response:

To replace any spaces after the first [ and before the last ]... search for:

(^[^\[] |\][^\]] $)|\s 

and replace with $1 (what's captured inside the first capture group). See this demo at regex101.

The idea is to use a variation of The Trick. It captures the first and last part to group 1 OR matches any whitespaces. Because the spaces are not captured, they get replaced by empty matches.

CodePudding user response:

With GNU awk for the 3rd arg to match() and gensub() we can get the output you show from the input you show with:

$ awk 'match($0,/([^[]*)(.*)/,a){ $0=a[1] gensub(/ /,"\",\"","g",a[2]) } 1' file
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

or with any awk:

$ awk 's=index($0,"["){ tgt=substr($0,s); gsub(/ /,"\",\"",tgt); $0=substr($0,1,s-1) tgt } 1' file
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

CodePudding user response:

no regex capture group or gensub() needed at all :

{m,g}awk '$!NF = substr($!(NF=NF),!_, __=index($_, "[")) substr(_,
            __ = substr($_,  __), gsub(" ","\",\"", __))__' OFS=']\",\"' FS='[]][ \t] ' 

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],
  • Related