Data Scientist - Benjamin Tovar

If the typing monkeys have met Mr Markov: probabilities of spelling “omglolbbq” after the digital monkeys have read Dracula

21

Feb

 

If the typing monkeys have met Mr Markov: probabilities of spelling “omglolbbq” after the digital monkeys have read Dracula

Introduction

The infinite monkey theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare.

There is a straightforward proof of this theorem. As an introduction, recall that if two events are statistically independent, then the probability of both happening equals the product of the probabilities of each one happening independently. For example, if the chance of rain in Moscow on a particular day in the future is 0.4 and the chance of an earthquake in San Francisco on that same day is 0.00003, then the chance of both happening on that day is 0.4 * 0.00003 = 0.000012, assuming that they are indeed independent.

Suppose the typewriter has 50 keys, and the word to be typed is banana. If the keys are pressed randomly and independently, it means that each key has an equal chance of being pressed. Then, the chance that the first letter typed is ‘b’ is 1/50, and the chance that the second letter typed is a is also 1/50, and so on. Therefore, the chance of the first six letters spelling banana is

typing_monkeys_1.png

less than one in 15 billion, but not zero, hence a possible outcome.

Methods

In my implementation, I will only consider 26 characters of the alphabet (from a to z, excluding the whitespace). The real question I would like to ask is the following:

Given a target word, say banana, how many monkeys would be needed to have at least one successful event (a monkey typed the target) after the monkeys have typed 6 characters.

To solve this, first calculate the probability of typing the word banana:

typing_monkeys_2.png

Now, just compute the number of monkeys that might be needed:

typyng_monkeys_3.png

The model that assigns the same probability for each character is labelled as uniform model in my simulation.

My goal is to optimize n (minimize the number of monkeys needed because I am on a tight budget). So I decided to use a Markov Chain model of order 1 to do so. If you are unfamiliar with Markov Chains here is a very nice explanation of the models here.

The training set of the emission probability matrix, consist on a parsed version of Dracula (chapters 1 to 3, no punctuation signs, lower-case characters only)

The emission probability matrix of the Markov Chain ensures that the transition from one character to another character is constrained by previous character and this relation is weighted based on the frequencies obtained in the training text.

It is like having a keyboard with lights for each key, after a is pressed, the light intensity of each key would be proportional of what characters are more likely to appear after an a. For example b would have more light than a, because it is more common to find words having *a-b* than *a-a*

Results

1) Plot the distribution of characters in the uniform model

typing_monkeys_barplot_freq.pngDistribution of characters after 10,000 iterations using the uniform model

2) Plot the emission matrices

typing_monkeys_em_m_networkA) As expected, the transition from one character to another character is constrained by previous character and this relation is weighted based on the frequencies obtained in the training text. B) in the uniform model each character has the same probability to be typed and does not depend on the previous character.

3) Compare the performance of the two models

typing_monkeys_ratio_n_monkeysIn this plot I am comparing the number of monkeys (log10(x)) required to type the target words (indicated in red text) using the Markov Chain model and the uniform model. In general the Markov Chain model requires less monkeys in words that are likely to appear in the training set, like “by”, “the”, “what” , “where” and “Dracula”. On the other hand, words that only have one character like “a”, given that there’s no prior information the models perform equally. Now another interesting example is the word “linux”, in which is not very likely to appear in the training set and therefore the models perform likely equally. The extreme case example is the word “omglolbbq”, in which the Markov Chain model performs worse than the uniform model due of the very low probability of this word to happen, so it is penalized and I will need more monkeys to get this target word

Code

You can download the code clicking here

twittergoogle_plusredditlinkedin

Tags: , , , ,


Leave a comment
 

Your email address will not be published. Required fields are marked. *