I have been looking for a solution for this for days now, I want to replace all spaces only when they occur inside square brackets, and replace them with a double quote, comma, double quote. ---> ","
The solution can be any where as long as it works, but need to say that its a huge file.
An example:
"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.3"],
Will be:
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],
Preferable solutions since I am working with these tools: Visual Studio OR Sed.
Thank you
CodePudding user response:
Python Solution
Replace the sample.txt
file with your file and it will replace the spaces with in the []
with ","
and write it back to the file.
process.py
import re
import os
with open('sample.txt',"r ") as f:
contents = f.read()
reg1 = re.compile('(?<=\[\")(.*)(?=\"\])', re.MULTILINE)
lines = reg1.findall(contents)
for line in lines:
newLine = re.sub(r'\s','","',line)
contents = contents.replace(line,newLine)
f.seek(0, os.SEEK_SET)
f.write(contents)
sample.txt
"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.3"],
"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.1"],
"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.2"],
"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.4"],
"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.5"],
"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.6"],
Output
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.1"],
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.2"],
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.4"],
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.5"],
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.6"],
CodePudding user response:
Using sed
$ sed ':a;s/\(\["[^ ]*\) /\1","/;ta' input_file
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],
If the loop does not work for your use case but the pattern will always be the same, you can also try
$ sed 's/\(\[[^ ]*\) /\1","/g' input_file
"NIST 800-171A": ["3.11.2[a]","3.11.2 3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],
CodePudding user response:
To replace any spaces after the first [
and before the last ]
... search for:
(^[^\[] |\][^\]] $)|\s
and replace with $1
(what's captured inside the first capture group). See this demo at regex101.
The idea is to use a variation of The Trick. It captures the first and last part to group 1 OR matches any whitespaces. Because the spaces are not captured, they get replaced by empty matches.
CodePudding user response:
With GNU awk for the 3rd arg to match() and gensub() we can get the output you show from the input you show with:
$ awk 'match($0,/([^[]*)(.*)/,a){ $0=a[1] gensub(/ /,"\",\"","g",a[2]) } 1' file
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],
or with any awk:
$ awk 's=index($0,"["){ tgt=substr($0,s); gsub(/ /,"\",\"",tgt); $0=substr($0,1,s-1) tgt } 1' file
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],
CodePudding user response:
no regex capture group
or gensub()
needed at all :
{m,g}awk '$!NF = substr($!(NF=NF),!_, __=index($_, "[")) substr(_,
__ = substr($_, __), gsub(" ","\",\"", __))__' OFS=']\",\"' FS='[]][ \t] '
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],