I'm trying to get the following Regex right: I need to split a whole shell command represented as String by (multiple) spaces. However this splitting shall happen everywhere except inside double quotes
The closest I came to the solution is this: \s (?![^\"]*[\"])
But this isn't matching the first two spaces in an example such as:
cao ciao "sad asd" cai
which shall be splitted as:
- cao
- ciao
- sad asd
- cai
What am I missing?
Ps: I'm writing a kotlin wrapper for a shell API, and luckily the range of constructs and different options is quite limited, so I think Regex is a nice fit for
Pps: I asked for a Regex while the duplicate answer suggested involves a StringTokenizer
and Matcher
, which are other things
Found! \s (?=[^"]*(?:"[^"]*"[^"]*)*$)
Kudos the regex101 community
CodePudding user response:
It might be easier to match the tokens instead of the spaces between them. Instead of split
, extract all matches of
("[^"]*"?|\S)
I used ?
so that a single "
without a closing "
causes everything till the end to be read as one token.
Warning: You mentioned shell scripts. If you want to parse a shell script using a regex, you will have a very hard time, to say the least. For example, consider the following constructs:
echo 'a b'
echo "a \" b"
echo $'a b'
echo a\ b
echo "$(echo "a ") b"
echo \"
cat << EOF
a b
EOF
You need an actual parser to safely process shell scripts.
CodePudding user response:
Found:
\s (?=[^"]*(?:"[^"]*"[^"]*)*$)
You can try it here