Lecture 18: Nonregular Languages

Date

DFAs model computers with finite amount of memory. Thus:

Regular languages correspond to problems that can be solved with finite memory.

So, if you have a regular language, then it can be solved with some finite memory computer (i.e. the problem is solvable).

On the contrary, we have non-regular languages:

Nonregular languages, in a sense, correspond to problems that cannot be solved with finite memory.

(i.e. problems NOT solvable by a computer).

A side note: when, in the real, solving problems, think whether something can be modeled as a regular language which can help you determine if a problem is solvable (i.e. worth spending your time) or not.

Finding nonregular languages

For regular languages, it was theoretically easier to prove since it was existential: all we needed to do was find one NFA/DFA/regex.

However for NRL's, it is a universal statement. You might think that we'd need to prove that there exists no NFA or DFA or regex. But actually, we can leverage some intuition and realize that we only need to prove that there exist no DFA which subsequently covers all the other cases.

Keith defines three "traits" of DFAs which he lists on the board:
- Order
- Number
- Bound

He then walks through a few examples and shows that some languages simply don't meet all 3 and are thus irregular.

Drawing DFAs and proof routes

One intuition behind showing the universally-quantified statement that is "nonregular" is to attempt to draw a DFA, then somewhere in the state matching, you reach a contradiction.

In general, there are two proof routes:

Contradiction would be where an accept and reject string by the language end up in the same state.

Direct is a little more abstract: it means that, say you have some sequence which should read out the same but the subsequence starts at different places/ends at different places (just see lecture at like 25:00).
- In the context of the problem given, realize that once you reach $a^2$ and $a^4$ , you should be going down different sequences after (per the language constraints). Unfortunately, in that particular example, both $a^2$ and $a^4$ had the same ending sequence which is where the problem is since $a^2$ and $a^4$ are separate states. More definitively, we are saying we have the strings $a^2b^2$ and $a^4b^2$ which means our language is nonregular since they both have the same ending, but different beginnings. Intuition:
Now we will aim to generalize this intuition by formalizing it.

Distinguishability

Formalizing the above intuition essentially.

This setting guideline for what is accepted (exactly one) and not (if both are in the language, that would be impossible given our intuition).

Even more generalizable, we say that any two input strings that are distinguishable, must end up in different states (meaning one is rejected by the language and the other is accepted):

Proof by distinguishability (contradiction)

This is the "gold-standard" proof example for nonregular languages:

Another proof

Note on automaton construction: an underlying question you should keep in your mind (for all machine constructions for that matter) is "what do you need to remember" regarding the language.

Use the distinguishability property as a Lemma: