Goals
The main goals for Week 8 are to:
-
Understand the proof that Regular Expressions are equivalent in power to Deterministic Finite Automata. This is a long proof, split over 4 videos, but is something we hope everyone understands in detail. The methods used in the proof are similar to ones we have seen before in showing equivalence of NAND-Circ and Boolean circuits, but is a fair bit more complex. This general method of showing equivalence is something we will see again in this class, and you will encounter many more times in other classes and work that you will do.
-
Become familiar with the power and strangeness of nondeterminism. For finite automata, we prove this week that despite is seeming super-power, nondeterministic finite automata cannot compute any functions (or recognize any languages) that cannot be computed by deterministic finite automata. The main topic for Weeks 8-11 will be considering this question for more powerul computing models (whi
Schedule
Day | “Monday” Cohort | “Tuesday” Cohort | “Wednesday” Cohort | “Thursday” Cohort | “Friday” Cohort |
---|---|---|---|---|---|
Wed 13 Oct | Preparation | (Week 7) | (Week 7) | (Week 7) | (Week 7) |
Thurs 14 Oct | Preparation | Preparation | (Week 7) | (Week 7) | (Week 7) |
Fri 15 Oct | Preparation | Preparation | Preparation | (Week 7) | (Week 7) |
Sat 16 Oct | Prep Cohort Meeting | Preparation | Preparation | Preparation | (Week 7) |
Sun 17 Oct | Revision | Prep Cohort Meeting | Preparation | Preparation | Preparation |
Mon 18 Oct | Assessed Meeting | Revision | Prep Cohort Meeting | Preparation | Preparation |
Tues 19 Oct | Writeup Due | Assessed Meeting | Revision | Prep Cohort Meeting | Preparation |
Wed 20 Oct | (Week 9) | Writeup Due | Assessed Meeting | Revision | Prep Cohort Meeting |
Thurs 21 Oct | (Week 9) | (Week 9) | Writeup Due | Assessed Meeting | Revision |
Fri 22 Oct | (Week 9) | (Week 9) | (Week 9) | Writeup Due | Assessed Meeting |
Sat 23 Oct | (Week 9) | (Week 9) | (Week 9) | (Week 9) | Writeup Due |
Cohort Problems
These are the problems you should discuss in your Cohort Meeting, and everyone in your cohort should be prepared to present and discuss solutions to at the Assessed Cohort Meeting:
The problems are posted here and we think its a good idea to look at them early, but you’re not expected to be able to solve them until after doing the readings and watching the videos below.
Programming
Before you begin on this assignment, you will need to download three python files and place them in the same project:
The NFA
program implements nfa simulation and various closure constructions. The regex
program defines a class that reads regular expressions and converts them to NFAs. The week8
program contains example uses of the NFA
and regex
functionality, and also contains the stub for the huntingtons
function you must implement.
You’ll need to use the following functions (example usage of all of them are in week8.py
):
regex.regex_to_NFA(the_regex)
: Ifthe_regex
is a string, the functionregex_to_NFA
found inregex.py
returns and NFA that computes the language indicated bythe_regex
the_nfa.execute(the_string)
: Ifthe_nfa
is an instance of theNFA
class defined inNFA.py
, then invoking the class methodexecute
on stringthe_string
will return a Boolean value indicating whether or notthe_string
belonged to the language of the NFA.the_nfa.toDot()
: Ifthe_nfa
is an instance of theNFA
class defined inNFA.py
, then invoking the class methodtoDot()
will result in a dot (which is a graph specification language) description of the NFA being printed to standard out. You can then copy-paste that description here to see the actual NFA: https://dreampuf.github.io/GraphvizOnline/
What to do
First, read through all of week8.py
and follow the instructions therein to make sure you understand what’s going on. After this you’ll implement the huntingtons
function.
For the huntingtons
function you’re asked to implement, the idea is for you to use regular expressions to diagnose whether a person has Huntington’s disease by looking at their genome.
A DNA sequence is a string of the characters a, t, g, c, representing the sequence of nucleotides that comprise an individual’s DNA molecules. Very often, the human genome has regions where the same nucleotide sequences repeat many times. Sometimes, certain numbers of these repeats will result in a genetic disorder.
Huntington’s Disease is caused by having too many consecutive copies of the sequence “cag”. By looking at a DNA sequence, we can categorize how a person might be affected by Huntington’s in the following way:
- An individual with less than 26 sequential repeats of “cag” in their genome is considered to be “normal”.
- An individual with between 26 and 35 repeats of “cag” is considered to be a “carrier”, and may give the disease to their children.
- An individual with between 36 and 39 repeats is said to be “at risk”, and may or may not ever show symptoms.
- An individual with 40 or more repeats is said to be “affected”, and will eventually show symptoms of the disease.
Your huntingtons
function should take a DNA sequence (a string from the alphabet a,g,t,c) and determine the classification for the individual (i.e. normal, carrier, at risk, or affected). To do this, define a regex for each category, convert each to a nfa, then check which category the given DNA sequence falls into. Once you have determined the category, the function should return the appropriate string of: “normal”, “carrier”, “at risk”, or “affected”
Note that the DNA sequence may have characters before or after the “cag” repeats.
Source describing Huntingtons.
Reading
This week continues with the material in Chapter 6: Functions with Infinite domains, Automata, and Regular expressions that we started on last week.
Section 6.4.2 of the book includes a proof that regular expressions are equivalent in power to deterministic finite automata, but it is quite different from the proof we do in the videos. Ambitious students are encouraged to understand both proofs, and to consider the advantages and disadvantages of the approach taken in the book compared to our approach (which is the one done more traditionally in theory of computation courses, at least going back to Mike Sipser’s courses in the 1980s).
There are many books (and other videos) that do present material similar to what is in the videos this week, but we are not assigning any additional readings.
Videos
You can play all the videos using this playlist, but don’t forget to take breaks: Week 8 Playlist (Note that this content was week 7 in prior semesters).
These videos are edited from these cs3102 classes (we don’t generally recommend watching the unedited versions, but they are available if you want to):
-
(Spring 2020) Nondeterminism: Full Video, Slides
-
(Spring 2020) FSA Equivalent to Regular Expressions: Full Video, Slides
Nondeterministic Nathans (1:23)
Warm-Up: Complementing the XOR Language (5:54)
Equivalence of Regular Expressions and FSAs (6:12)
Proving FSAs are as Powerful as Regular Expressions (Part 1: Proof Strategy) (4:35)
Proving FSAs are as Powerful as Regular Expressions (Part 2: Bases, Union) (22:55)
Nondeterminism (11:14)
Nondeterministic Finite Automata (NFAs) (13:48)
Equivalence of NFAs and DFAs (10:01)
Proving FSAs are as Powerful as Regular Expressions (Part 3: Concatenation) (14:58)
Proving FSAs are as Powerful as Regular Expressions (Part 4: Kleene Star) (8:21)