
Measure theory
Given a Markov transition matrix and an invariant distribution on the states, we can impose a probability measure on the set of subshifts. For example, consider the Markov chain given on the left on the states \(A,B_{1},B_{2}\), with invariant distribution \(\pi =(2/7,4/7,1/7)\). If we "forget" the distinction between \(B_{1},B_{2}\), we project this space of subshifts on \(A,B_{1},B_{2}\) into another space of subshifts on \(A,B\), and this projection also projects the probability measure down to a probability measure on the subshifts on \(A,B\).
The hidden part of a hidden Markov model, whose observable states is non-Markovian.
The curious thing is that the probability measure on the subshifts on \(A,B\) is not created by a Markov chain on \(A,B\), not even multiple orders. Intuitively, this is because if one observes a long sequence of \(B^{n}\), then one would become increasingly sure that the \(Pr(A|B^{n})\to {\frac {2}{3}}\), meaning that the observable part of the system can be affected by something infinitely in the past.
Conversely, there exists a space of subshifts on 6 symbols, projected to subshifts on 2 symbols, such that any Markov measure on the smaller subshift has a preimage measure that is not Markov of any order (Example 2.6).