I'm using the notation L(a)
to denote a language - a set of strings, described by a regular expression s
. For example L(a)
is the set {"a"}
.
The same way, using derivation rules to find the language for a regular expression I can do the following:
L(a(b|c)) => {ab, ac}
because
a(b|c) => a(b) = ab
and
a(b|c) => a(c) = ac
which I understand. But I can't come up with the derivations in the following example:
L((a|b)*) = ab
because
(a|b)*
=> (a|b)(a|b)*
=> a(a|b)*
=> a(a|b)(a|b)*
=> ab(a|b)*
=> ab
How's that the first step is (a|b)(a|b)*
? Having defined *
as:
A concatenation of any number (including 0) of strings derived from s. The number depends on how many times the second rule is used:
s* equals s* =>
s* => s(s*)
Taken from Springer, Introduction to Compiler Design, 2nd - 1.1 Regular expressions
CodePudding user response:
I don't think that your example is correct (maybe there is a typo). The derivation rules of Brzozowski that you need are:
(1) L(a*) = L(a)*
(2) L(a|b) = L(a) U L(b)
(3) L(a) = {a}
Then you get:
L((a|b)*) = L(a|b)* ; applying rule #1, note the Kleene star location
= (L(a) U L(b))* ; rule #2
= ({a} U {b})* ; rule #3, two times
= ({a, b})* ; combining into a single set
= {a, b}* ; removing the useless parentheses