I have some text like this:
0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 [-0.01357987 0.99989218 -0.00558794] [ 0.50810066 0.81535196 -0.27755161] -18017079.1047535 10307726.258588774 -23524317.110919423 22838.17515528947 36648.54674929567 -12475.426190771163 20757272.462656207 5 5 0.005 0 5 2 0.0005 3 -98.28031520537542 29.516134353642414 19998.73252382984 -0.0055879708379507065 -0.06085533474930652 359.2218946628823 818.306543386653 2.7826806154507513 8.100054108045068 -0.7584403000503389 -0.02994106840115437 -0.0628111825224635 0.058338781314879004 242.42818865783832 0.0 0.8063781178568004 -2.6036838124274486 0.0 -0.734866228020307 -0.062 -0.062 0.008726849073962957 0.0 123.16666557966661 1.2292998660484957e-09 0.0 0.0 25619.0 Allies [-0.06088792 0.00475117 0.9981333 ] 0.5064766089927465 0.3371159370714128 0.6267890628740791 1.6191164404478644 -4.404605641298986 1.0085164509526248 0.9403428264947271 1.002228406356249 0.5911076156375097 0.04943091153402836 -0.12347543075231103 -0.031096345163790243 -0.1049357617938111 0.024866980145114622 0.04861966645392242
I would like to split them by space, but preserving the list inside. So I am considering first replace spaces inside the lists with commas(trimming the beginning and ending spaces if any). And then I could do a simple splitting by spaces. Is this doable via regex on VSCode? One can write a verbose Python script to handle this logic. But I think that would be dumb and not elegant.
CodePudding user response:
There is no need for your roundabout approach of replacing some spaces with commas. You could write the following in Python. I expect it would be similar in VSCode.
import re
str = "1 0 [-0.01 0.99]\n6.34 [0.501 -0.27] -18.10\n10.25"
rgx = r'\[.*?\]|\S '
re.findall(rgx, str)
#=> ['1', '0', '[-0.01 0.99]', '6.34', '[0.501 -0.27]', '-18.10', '10.25']
Regex demo<-\(ツ)/->Python demo
The regular expression has the following elements.
\[ # match left bracket
.*? # match zero or more characters lazily
\] # match right bracket
| # or
\S # match one or more non-whitespace characters, as many as possible
As an aside, when contemplating hackish solutions to a problem, such as replacing one character with another, performing some operations and then switching the replacement character back to the original one, look for a cleaner solution. There will always be one. Hackish solutions will not enhance your reputation among other coders and will cause you embarrassment if you must revise your code at a later date, after have gained more experience.
CodePudding user response:
You can identify the spaces outside square bracket by the following regex:
" (?![^[\]]*\])"
Explanation:
- negative look ahead to ensure that there is no closing square bracket before a opening square bracket
- if both 1 and 2 conditions are met then the space(s) gets matched
You can also identify spaces inside bracket by the following regex:
" (?=[^[\]]*\])"
- Positive look ahead to ensure that there is a closing square bracket before a opening square bracket.
- if both 1 and 2 conditions are met then the space(s) inside the boxes gets matched
Update:
My understanding from the original post is to make list inside list by splitting spaces inside and outside of the braces as well. Combined python code to extract your desired output ( run here )
import re
regex = r" (?![^[\]]*\])"
test_str = "0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 [-0.01357987 0.99989218 -0.00558794] [ 0.50810066 0.81535196 -0.27755161] -18017079.1047535 10307726.258588774 -23524317.110919423 22838.17515528947 36648.54674929567 -12475.426190771163 20757272.462656207 5 5 0.005 0 5 2 0.0005 3 -98.28031520537542 29.516134353642414 19998.73252382984 -0.0055879708379507065 -0.06085533474930652 359.2218946628823 818.306543386653 2.7826806154507513 8.100054108045068 -0.7584403000503389 -0.02994106840115437 -0.0628111825224635 0.058338781314879004 242.42818865783832 0.0 0.8063781178568004 -2.6036838124274486 0.0 -0.734866228020307 -0.062 -0.062 0.008726849073962957 0.0 123.16666557966661 1.2292998660484957e-09 0.0 0.0 25619.0 Allies [-0.06088792 0.00475117 0.9981333 ] 0.5064766089927465 0.3371159370714128 0.6267890628740791 1.6191164404478644 -4.404605641298986 1.0085164509526248 0.9403428264947271 1.002228406356249 0.5911076156375097 0.04943091153402836 -0.12347543075231103 -0.031096345163790243 -0.1049357617938111 0.024866980145114622 0.04861966645392242"
list = re.split(regex,test_str)
for i in range(len(list)):
if re.search(r'^\[',list[i]):
tmp=re.sub(r'\[ *| *\]','',list[i])
tmp2=re.split(regex,tmp)
list[i]=tmp2
print(list)
output:
['0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '1', '0', '0', '0', '0', ['-0.01357987', '0.99989218', '-0.00558794'], ['0.50810066', '0.81535196', '-0.27755161'], '-18017079.1047535', '10307726.258588774', '-23524317.110919423', '22838.17515528947', '36648.54674929567', '-12475.426190771163', '20757272.462656207', '5', '5', '0.005', '0', '5', '2', '0.0005', '3', '-98.28031520537542', '29.516134353642414', '19998.73252382984', '-0.0055879708379507065', '-0.06085533474930652', '359.2218946628823', '818.306543386653', '2.7826806154507513', '8.100054108045068', '-0.7584403000503389', '-0.02994106840115437', '-0.0628111825224635', '0.058338781314879004', '242.42818865783832', '0.0', '0.8063781178568004', '-2.6036838124274486', '0.0', '-0.734866228020307', '-0.062', '-0.062', '0.008726849073962957', '0.0', '123.16666557966661', '1.2292998660484957e-09', '0.0', '0.0', '25619.0', 'Allies', ['-0.06088792', '0.00475117', '0.9981333'], '0.5064766089927465', '0.3371159370714128', '0.6267890628740791', '1.6191164404478644', '-4.404605641298986', '1.0085164509526248', '0.9403428264947271', '1.002228406356249', '0.5911076156375097', '0.04943091153402836', '-0.12347543075231103', '-0.031096345163790243', '-0.1049357617938111', '0.024866980145114622', '0.04861966645392242']