This homework set consists of 10 problems.
You will be creating a document that contains numbered solutions to
the problems, using LaTeX, OpenOffice, Word, or other document preparation
system. If a particular problem asks for source code, you may embed the source
in place in the document, or give the answer as "See <filename>".
Generate a PDF of this document, and
You must submit a hard copy of the pdf and any source
files that you did not embed in the document in one packet of stapled
together 8.5" × 11" sheets of paper. This
assignment is due Wednesday, November 5 for graduates and Thursday,
November 6 for undergrads.
Readings: [Scott], Chapter 1 and Chapter 2,
up through the section on specifying syntax.
Skim the section on scanning, enough so that you can get a good
feel for what a finite state machine is. Consider the
"Check Your Understanding" questions as part of your reading assignment:
work them out but don't turn in your answers.
- Give examples (in any language you like) of each of the following:
- A lexical error
- A syntax error
- A static semantic error
- A dynamic semantic error detected by code generated by the compiler
- An error that the compiler cannot catch nor easily generate
code to catch (a real error, not a program bug)
- Consider these two C functions.
unsigned gcd1(unsigned i, unsigned j) {
while (i != j) {
if (i > j) i = i - j; else j = j - i;
}
return i;
}
unsigned gcd2(unsigned i, unsigned j) {
while (i != j) {
if (i > j) i = i % j; else j = j % i;
}
return i;
}
Is the second one correct? If not, show how to fix it. Under what
circumstances would you expect one or the other to be faster?
- Write, from scratch, Ruby and C versions of
map
and reduce, together with a few interesting unit tests.
- For the following four sets of strings
- Strings of "8-bit" characters
beginning and ending with a double quote character, that do
not contain control characters, and for which the backslash
is used to escape the next character. (These are similar
to, but not exactly the same as, string literals in C.)
- The paren-star form of comments in Pascal: strings beginning
with (* and
ending with *) that do not contain *). Note
comments "don't nest".
- Floating-point constants in Ada (you'll have to look this
one up yourself)
- All non-empty sequences of letters other than "read", "red", and "real"
do the following
- Write a Perl or Ruby subroutine to determine if a string is in the
set using regular expression matching.
- Write a Java method to determine if a string is in the
set using regular expression matching.
- Write a finite state machine that recognizes the set.
Unit tests are required.
- Based on your work in the previous problem on strings that
are not "red", "real", or "read", comment on the practicality
of using regular expressions for defining the set of allowed
identifiers in a real programming language. (Identifiers are
generally alphanumeric strings other than the keywords.)
- Write an unambiguous grammar for the language of all well-formed
classic regular expressions (those involving only symbols, alternation,
Kleene star and catenation). Assume the existence of a token called
SYMBOL. Make all operators left-associative. Give
Kleene closure the highest precedence and alternation
the lowest precedence.
- The language Pascal has an ambiguous grammar; a fragment of which
follows:
STMT -> ASSIGNMENTSTMT | CALL | IFSTMT | WHILESTMT | 'begin' STMT* 'end'
IFSTMT -> 'if' EXP 'then' STMT ('else' STMT)?
- Show two parse trees for "if e1 then if e2 then s1 else s2"
- Rewrite this grammar fragment to remove the ambiguity by disallowing
if statements immediately after a "then" (as in Algol 60).
Make sure you allow "if e1 then begin if e2 then s1 end else s2"
as well as "if e1 then begin if e2 then s1 else s2 end"
- The syntax for type casts in C and its descendants introduces
potential ambiguity: is (x)-y a subtraction, or the unary
negation of y cast to type x? How do
C, C++, and Java deal with this potential ambiguity?
- Design the syntax (in EBNF) of a small language consisting (only) of
- Literals: integer literals only.
- Variables: globals only, of type integer only.
- Declarations: Variables must be declared, but since variables can
only have type integer, no type specification appears in the declaration.
- Expressions: Operators are those of Java except increment,
decrement, cast, instanceof, and assignment; operands are integer
literals and variables and the boolean literals
true and false. Note that while variables can only have the type
integer, expressions can have either boolean or integer type.
Typing is implicit in the syntax (not in the semantics like in
most languages).
- Statements: declarations, assignment, if, and while. If and
while conditions
must have boolean type. Ensure your syntax has no dangling else.
(Note how in this language assignment is a statement, not an
expression.)
- A program is a statement sequence.