Data Dependencies

10 min readApr 28, 2020

Onion Peeling the very first Algorithm you have run in your head as a kid.

Which was the first algorithm you have ever run as a kid?…

If no cookie, cry? That has an if, and so depends on external input, so yes.

Something with a loop? While no cookie, keep crying? Ok, that counts.

Open box of toys, select toy, play and put back toy? That’s a multi-step algorithm, also toy-in-the-box dependent, be it without a condition or loop.

So now think of one with a loop, a condition, and multiple steps …

For me it was scoring my older sister’s guesses in Master Mind (more specifically the Dutch Clipper version “Kleur bekennen” below).

Kleur bekennen, where a secret combination chosen by the CodeMaker, out of 3 (wooden!) pegs, each having 1 out of 6 colors had to be guessed by the opponent, the CodeBreaker. The CodeMaker would have to score by putting a yellow person-like peg for each peg in the right place and a red person-like-peg for each peg with a right color on the wrong place. A kids life was tough in the 70s! ;^)

Was I 5 or 6? Hard to remember. What I do remember is that it is easy to count how many of my sister’s colored pegs where in the right place, but that counting the ones which had the right color but were in the wrong place was a bit harder, essentially because one had to exclude the ones counted as being in the right place already, to avoid double counting. So one had to keep some bookmarking in ones head.

Little did I realise at the time, but this was my very first acquaintance with algorithms and in this case, even an algorithm with internal data dependencies. There was a vague realisation that one could do things in the wrong order, and count pegs twice if not watching out.

In fact, there are indeed many possible ways to come to the correct answer and also many to come to the wrong answer if one breaks data dependencies.

Surprisingly, this basic algorithm, the very first you may have run in your head ever, serves as a very good example indeed of the basic concepts of optimising, code restructuring sequential and even parallellising compilers, when they are making code more compact via (affine) loop (index) transformations. I explain below.

1 Reintroducing Master Mind

We consider that the game of Master Mind consists of a CodeMaker player
and a CodeBreaker player.

The CodeMaker first comes up with a number ‘secret’ of N digits
(we restrict each digit to be between 1 and N).

Then the CodeBreaker comes up with guesses ‘guess’ for that number,
where the CodeMaker then, for each guess by the CodeBreaker,
judges how many digits are on the right place between ‘guess’ and ‘secret’.
From this, the CodeBreaker learns and improves his guess until the guess equals the secret.

2 Coding the CodeMaker Scoring Algorithm

We first wrote a Notebook with some tests, in a high level language (Python3), with the code to solve the CodeMaker’s part of Master Mind. We do this in two iterations, an obvious approach (where we need two sequential loops each containing guess-digit to secret-digit comparisons, to derived the two numbers), and then an optimised approach, leading to more compact code by combining the loops in one loop.

We start with some initialisations.

2.1 Initialisations

Running the above gives

N = 4
  >>> HP-15C displayI_: [4,         ] <<<
secret = 4323

Then comes the code for the CodeMaker checking functions.

2.2 Straightforward Approach with Two Sequential Loops

In the code below, the functions calculate_n_correctly_positioned_digits and
calculate_n_displaced_digits who are doing what they say
since they are aptly named. :)

Note that it is important to execute calculate_n_displaced_digits
first, before calculate_n_displaced_digits is executed.

The reason is that the correctly positioned pegs/digits need to be marked as already considered, (in the matched_secret, matched_guess arrays that track of that information) to avoid that digits (both in secret and in guess) which have been matched already will be compared and potentially counted again. Each digit can only be counted at most once (when a match is found).
It’s not just that there are two categories, correctly and wrongly positioned digits, but the correctly positioned ones really need to be treated first.

If in calculate_n_correctly_positioned_digits, a digit on position j in the secret
is the same as the digit in position k of the guess, neither digit is again used in comparisons in the second check function calculate_n_displaced_digits, but if they are unequal, both will still be used in the second function.

This is a data dependency that imposes hard constraints on the order between some of the comparisons to perform here. This dependency is reflected in the code by the first function potentially writing to matched_secret and matched_guess 1D arrays, which are being read by the second function.

One can argue that when the functions are swapped in execution order, that is also the case, which is true, but one would get a different, and not intended result, in a lot of cases. In other words such an order swapped algorithm would not implement the desired behaviour.

This gives

turn = 1 ------------------------------
  guess = 2443
  >>> HP-15C displayI_: [2443,      ] <<<
  >>> HP-15C displayO1: [2443,12    ] <<<
  >>> HP-15C displayO2: [1,         ] <<<

  turn = 2 ------------------------------
  guess = 3432
  >>> HP-15C displayI_: [3432,      ] <<<
  >>> HP-15C displayO1: [3432,04    ] <<<
  >>> HP-15C displayO2: [2,         ] <<<

  turn = 3 ------------------------------
  guess = 4313
  >>> HP-15C displayI_: [4313,      ] <<<
  >>> HP-15C displayO1: [4313,30    ] <<<
  >>> HP-15C displayO2: [3,         ] <<<

  turn = 4 ------------------------------
  guess = 4323
  >>> HP-15C displayI_: [4323,      ] <<<
  >>> HP-15C displayO1: [4323,40    ] <<<
  >>> HP-15C displayO2: [4,         ] <<<

This output is intended to be close to what an implementation on a HP-15C calculator with a 10 digit only display would be. The first guess would be 2443 (displayI_ for display input) , and the first answer would be 2443,12 (displayO1 for display output 1) where the ,12 indicates ,cd so the correct and displaced number of digits and the second answer would be 1, (displayO2 for display output 2) indicating ,t where t is the number of turns the CodeBReaker already used up.

The functions

calculate_n_correctly_positioned_digits

and

calculate_n_displaced_digits

are doing what they say since they are aptly named. :)

2.3 Less Straightforward Approach with One Sequential Loop

What still feels suboptimal to the program version above is that we have one loop to deduce the digits in the right place and one, after that, to deduce the digits on the wrong place.

2.3.1 Combining Loop Bodies

There is some commonality of body code in those loops. So a nicer program, reusing common code, could result from trying to merge both loops into one. In particular:

occurs in the second function, which one can consider to also add to the first function loop body, since they always evaluate to true, since ‘0’ is the initialisation value for both the matched_secret and matched_guess array.

However, merging into one loop will require to check that we maintain the data dependency discussed above to ensure algorithm correctness.

2.3.2 Analysing Data Dependencies.

Let’s analyse what are the data dependencies of the algorithm as a whole, so for both functions together. To make the comparisons of each secret digit with each guess digit a bit more visual, consider the two layouts below.

Data dependencies (left) and one of the many possible evaluation orders (right)

Both left and right, the vertical axis (of rows) represents the incremental values (0,1,2) of the index into the secret peg row / number, while the horizontal axis represent the incremental values (0,1,2) of the index into the guess peg row / number.

Essentially, the equal positioned pegs (or digits) need to be matched first. This is represented in the left half of the figure by the colored arcs, all meaning do not execute ‘this’ before ‘that’. On a single process machine, this means effectively the same as: execute ‘this’ before ‘that’. On a machine with parallel processing units, this means: execute ‘this’ before or at the same time as ‘that’, which is effectively that same as ‘this’ not after ‘that’. Hence the more complex wording, which is valid for both types of machines.

Given these order restrictions from the colored arrows on the left, there are many possible sequential implementations. One of them is given in the right. Following the a, c, d, e, … h to i, alphabetical order. This order does not go across any data dependency arc on the left side indeed.

How do we easily enforce this order in code? Via a transformation of the loop index-couples.

2.3.3 Ensuring Correct Order of Index-Couples

More specifically, the lines:

ensure this index-couple order.

As for loop transformations like this, I read about it in 1991 from a book of Utpal Banerjee [1],[2], I obtained from the IMEC library as a student. They are very useful for compilers, first in case you want to allow the compiler to restructure the code for efficiency in terms of reducing the number of lines. For this, dependency analysis in terms of data flow is important. But, also in the case of a parallellising compiler, targeting not one but multiple processing units, it can, when it understands all data dependencies, derive what operations can be executed in parallel (when two operations are not interdependent) and which ones cannot (when two operations have a data dependency and so should be executed sequentially). I remember having this epiphany while reading Utpal Banerjee’s book on this and especially liked the automatic procedure in finding these optimising transformations. Later, on my MSc in Computation at Oxford University in 1995, I took a course in Bulk Synchronous Parallellism (BSP), co-invented/discovered by Oxford’s Bill McColl in 1992 [3], where it was again one of the major techniques in obtaining efficient parallellisation. Essentially auto-discovering data-dependencies as well as an automatic index-reorganising ‘loop transformation’ lead to following the data flow with a ‘barrier of parallel processing units’.

In our case of Master Mind, say we consider the j (4 secret digits) as the vertical axis and the k (4 guess digits) as the horizontal one.

Then in step 1, the barrier is a line that starts on the diagonal (where in principle the first function could be executed in parallel on the 4 digits, so say with 4 processors at the same time).

In step 2, that line, with the same diagonal slope, is moved right one position and the second function could be executed in parallel again by 4 parallel processors simultaneously. Then in step 3, as well as 4, the line moves right again.

In fact, steps 2,3,4 over the 3 * 4 = 12 matrix elements could be executed entirely in parallel, since there are no data dependencies. That is not to say that the same index-pairs would be visited then (in some schemes more index pairs would be skipped than in others), but the outcome answer: the number of displaced digits (d) would be the same.

2.3.4 Optimised Implementation

The code below implements both the above requirements, merging into one loop and retaining data dependencies, together. Note that the code under the check_index_order branch is just for the purpose of demonstration of the correct order of execution and can be left out in an actual implementation. If left out, it surely result in fewer lines than the code of the straightforward approach under section 2.2.

We now run the second, optimised, approach and check that for just one guess comparison to the secret, all index couples are visited and in the right order and that the result is the same as for the above initial code.

which gives

calculate_all_digits
  (j,k) = (0,0)
    [j,k],[secret[j],guess[k]] = [0, 0], [4, 4]
      matched_secret = ['1', '0', '0', '0']
      matched_guess  = ['1', '0', '0', '0']
  (j,k) = (1,1)
    [j,k],[secret[j],guess[k]] = [1, 1], [3, 3]
      matched_secret = ['1', '1', '0', '0']
      matched_guess  = ['1', '1', '0', '0']
  (j,k) = (2,2)
  (j,k) = (3,3)
    [j,k],[secret[j],guess[k]] = [3, 3], [3, 3]
      matched_secret = ['1', '1', '0', '1']
      matched_guess  = ['1', '1', '0', '1']
  (j,k) = (0,1)
  (j,k) = (1,2)
  (j,k) = (2,3)
  (j,k) = (3,0)
  (j,k) = (0,2)
  (j,k) = (1,3)
  (j,k) = (2,0)
  (j,k) = (3,1)
  (j,k) = (0,3)
  (j,k) = (1,0)
  (j,k) = (2,1)
  (j,k) = (3,2)
--(c,d) = (3,0)-

So this is all correct indeed.

We now test the same function on the 4 guesses above, this time non verbose, as we know the index-couple traversal order is checked already and not changing here.

which gives

secret = 4323

  guess = 2443
--(c,d) = (1,2)-

  guess = 3432
--(c,d) = (0,4)-

  guess = 4313
--(c,d) = (3,0)-

  guess = 4323
--(c,d) = (4,0)-

So, those are the correct answers. :)

3 Conclusions

What have we learned?

We have taken the guess scoring algorithm of Master Mind, which may very well be the first algorithm any kid has run in their head ever.

To defend this first claim, say the algorithm needs to contain at least an if, a loop and an algorithm-internal data dependency (i.e. from some data produced in the algorithm itself to some data consumed in the algorithm later).

This little play-algorithm proved to be a very good candidate to illustrate data flow and data dependency theory and loop transformation theory, with which we could optimise it so that loop body code was reused. This resulted in fewer code lines in a machine implementation. Index-couples are compared in an order that respects the data dependencies inherent to the algorithm.

It’s altogether surprising that 6 year old kids can run a two step algorithm combining an O(n) loop and then an O(n²) 2 nested-loop phase containing data-dependencies from the first to the second in their heads, requiring and bookkeeping of index-pairs, even though the word algorithm means nothing to them. Then again, a machine executing the same also does not understand that concept either and does equally well.

4 References

[1] Utpal Banerjee. Dependence Analysis (Loop Transformation for Restructuring Compilers). Springer; 1 edition (October 31, 1996).
You can find an index of the book at https://link.springer.com/content/pdf/bfm%3A978-0-585-28122-3%2F1.pdf

[2] Utpal Banerjee. Loop parallelization. Kluwer Academic Publisher, 1994.

[3] Bulk-Synchronous Parallellism https://en.wikipedia.org/wiki/Bulk_synchronous_parallel

Footnotes

For an implementation of this exact algorithm on a very simple but beautiful machine, see my article on Medium, called Calculator Coding, where the machine is a the HP-15C calculator and the programming language is its machine language.

Peter Sels, April 28th 2020. Dedicated to Utpal.

The code appearing in this article is available as a runnable notebook on Github at https://github.com/PeterSels/OnComputation/tree/master/DataDependencies or as a gist already including all results at https://gist.github.com/PeterSels/85bab52dd1ecd77545af0b6b213a6915.