CMSI 386/585
Homework #4

This homework set consists of 10 problems. You will be creating a document that contains numbered solutions to the problems, using LaTeX, OpenOffice, Word, or other document preparation system. If a particular problem asks for source code, you may embed the source in place in the document, or give the answer as "See <filename>". Generate a PDF of this document, and You must submit a hard copy of the pdf and any source files that you did not embed in the document in one packet of stapled together 8.5" × 11" sheets of paper. This assignment is due Wednesday, November 5 for graduates and Thursday, November 6 for undergrads.

Readings: [Scott], Chapter 1 and Chapter 2, up through the section on specifying syntax. Skim the section on scanning, enough so that you can get a good feel for what a finite state machine is. Consider the "Check Your Understanding" questions as part of your reading assignment: work them out but don't turn in your answers.

  1. Give examples (in any language you like) of each of the following:
    1. A lexical error
    2. A syntax error
    3. A static semantic error
    4. A dynamic semantic error detected by code generated by the compiler
    5. An error that the compiler cannot catch nor easily generate code to catch (a real error, not a program bug)
  2. Consider these two C functions.
    unsigned gcd1(unsigned i, unsigned j) {
        while (i != j) {
            if (i > j) i = i - j; else j = j - i;
        }
        return i;
    }
    
    unsigned gcd2(unsigned i, unsigned j) {
        while (i != j) {
            if (i > j) i = i % j; else j = j % i;
        }
        return i;
    }
    
    Is the second one correct? If not, show how to fix it. Under what circumstances would you expect one or the other to be faster?
  3. Write, from scratch, Ruby and C versions of map and reduce, together with a few interesting unit tests.
  4. For the following four sets of strings
    1. Strings of "8-bit" characters beginning and ending with a double quote character, that do not contain control characters, and for which the backslash is used to escape the next character. (These are similar to, but not exactly the same as, string literals in C.)
    2. The paren-star form of comments in Pascal: strings beginning with (* and ending with *) that do not contain *). Note comments "don't nest".
    3. Floating-point constants in Ada (you'll have to look this one up yourself)
    4. All non-empty sequences of letters other than "read", "red", and "real"
    do the following Unit tests are required.
  5. Based on your work in the previous problem on strings that are not "red", "real", or "read", comment on the practicality of using regular expressions for defining the set of allowed identifiers in a real programming language. (Identifiers are generally alphanumeric strings other than the keywords.)
  6. Write an unambiguous grammar for the language of all well-formed classic regular expressions (those involving only symbols, alternation, Kleene star and catenation). Assume the existence of a token called SYMBOL. Make all operators left-associative. Give Kleene closure the highest precedence and alternation the lowest precedence.
  7. The language Pascal has an ambiguous grammar; a fragment of which follows:
        STMT -> ASSIGNMENTSTMT | CALL | IFSTMT | WHILESTMT | 'begin' STMT* 'end'
        IFSTMT -> 'if' EXP 'then' STMT ('else' STMT)?
    
    1. Show two parse trees for "if e1 then if e2 then s1 else s2"
    2. Rewrite this grammar fragment to remove the ambiguity by disallowing if statements immediately after a "then" (as in Algol 60). Make sure you allow "if e1 then begin if e2 then s1 end else s2" as well as "if e1 then begin if e2 then s1 else s2 end"
  8. The syntax for type casts in C and its descendants introduces potential ambiguity: is (x)-y a subtraction, or the unary negation of y cast to type x? How do C, C++, and Java deal with this potential ambiguity?
  9. Design the syntax (in EBNF) of a small language consisting (only) of