I'm trying to create an overly simplified version of bash, I've tried split the program into "lexer expander, parser, executor".
In the lexer i store my data (commands, flags, files) and create tokens out of them , my procedure is simply to loop through given input char by char and use a state machine to handle states, states are either a special character, an alphanumeric character or space.
Now when i'm at an alphanumeric state i'm at a command, the way i know where the next flag is when i encounter again alphanumeric state or if input[i] == '-'
, now the problem is with multi-flag commands.
For example:
$ ls -la | grep "*.c"
I successfully get the command ls, grep
and the flag -la, *.c
.
However with multi-flag commands like.
$ sed -i "*.bak" "s/a/b/g" file1 file2
It seems to me very difficult, and i can't figure out yet, how can i know where the flags to a specific command ends, so my question is how bash parse these multi-flags commands ? any suggestions regarding my problem, would be appreciated !
CodePudding user response:
The shell does not attempt to parse command arguments; that's the responsibility of the utility. The range of possible command argument syntaxes, both in use and potentially useful, is far too great to attempt that.
On Unix-like systems, the shell identifies individual arguments from the command line, mostly by splitting at whitespace but also taking into account the use of quotes and a variety of other transformations, such as "glob expansion". It then makes a vector of these arguments ("argv") and passes the vector to execve
, which hands them to the newly created process.
On Windows systems, the shell doesn't even do that. It just hands over the command-line as a string, and leaves it to the command-line tool to do everything. (In order to provide a modicum of compatibility, there's an intermediate layer which is called by the application initialization code, which eventually calls main()
. This does some basic argument-splitting, although its quoting algorithm is quite a bit simplified from that used by a Unix shell.)
No command-line shell that I know of attempts to identify command-line flags. And neither should you.
For a bit of extracurricular reading, here's the description of shell parsing from the Posix standard: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html. Trying to implement all that goes far beyond the requirements given to you for this assignment, and I'm certainly not recommending that you do that. But it might still be interesting, and understanding it will help you immensely if you start using a shell.
Alternatively, you could try reading the Bash manual, which might be easier to understand. Note that Bash implements a lot of extensions to the Posix standard.