Information and Computation

The study of computer systems generally begins with a look at how information is represented in a simplest form, and how we model information processing (namely, computation).

Symbols and Alphabets

A piece of (human-usable) information can be represented as a string of symbols; some examples are:

The symbols that make up pieces of data are chosen from some finite alphabet, such as {0,1,2,3,4,5,6,7,8,9} for the language of natural numbers, or {0,1,2,3,4,5,6,7,8,9,e,E,+,-} for the language of floating-point numbers. In reality, the alphabet you choose is irrelevant because it is always possible to recode any alphabet into a two-symbol alphabet. For example, let's say you have an alphabet {a,b,c,d,e}. You can recode this into the alphabet {0,1} as follows:

a → 000
b → 001
c → 010
d → 011
e → 100

Then the word "cab" becomes "010000001" in the new alphabet.

Exercise: One of the freshman thinks that a more "efficient" re-encoding would be a→0, b→1, c→10, d→11, e→010. Why is this no good?

Encoding

In fact, all information, whether it be numbers, characters, programming instructions, or complex data, can be encoded in strings from {0,1}, which we call bit strings. (The term bit is short for binary digit.) Often for ease of presentation we look at the bit strings in chunks of eight bits; these units are called octets, which is a more correct term than byte.

Interpretations

Since everything can be encoded into bit strings, decoding a bit string into the thing it encodes depends on its interpretation. For example, the bit string 1101010010110010 has the following meanings:

InterpretationMeaning
Unsigned Short Integer 54450
Signed Short Integer -11086
UTF-16 String The character HANGUL SYLLABLE PHIEUPH WE SSANGKIYEOK: 풲
UTF-8 String The character ARMENIAN CAPITAL LETTER BEN: Բ
ISO 8859-1 String The two-character string LATIN CAPITAL LETTER O WITH CIRCUMFLEX followed by SUPERSCRIPT TWO: Ô²
IA-32 Machine Instruction An instruction that will divide the value in the AL register by 178, placing the quotient in the AH register and leaving the remainder in AL

Functions

Information Processing is the transformation of one piece of data into another; the mathematical abstraction of this transformation is that of a function. A function associates with each member of a set (called the domain) a member of another set (called the codomain).

For example the successor function has domain N and codomain N, and associates with each member x of N the value x+1. We write this function as

λx.x+1

If we want to name the function we can write

succ: NN

to indicate its type and

succ(x) = x + 1

for its value. Similarly, the function

odd: Z → bool

is such that

odd(x) = not(equal(mod(x, 2), 0))

Thus odd (5) = true and odd(457349856340) = false.

odd.png

Machines

For centuries humans have entertained the notion of building machines to evaluate functions. Since a machine is a finite object, there are only a countable number of functions that can be (mechanically) computed, but the number of functions that can exist is uncountable. Church's Thesis says that the computable functions are precisely those for which a Turing Machine can be given, i.e.

A function f is computable if and only if a there exists a Turing Machine that outputs f(x) for any input x.

Now it turns out that there exists a Turing Machine that computes a function g such that

g(F,x) = f(x)

where F is a Turing machine that computes f.

In other words, there can exist a machine which can take as input (1) a description of another program and (2) input to that program, and produce the result of executing the function on that input! So programmable computers that can compute all computable functions can theoretically exist! We take this for granted, but the result is profound.