Mark Dow

Geek art

Minimal arrays containing all sub-array combinations of symbols: DeBruijn sequences and tori

    For n symbols in m groups, what are the smallest arrays that contain all combinations of adjacent elements?

    This is a symbolic combinatorics question. It shares some characteristics with the necklace problem. I know very little about this topic, and have not attempted to formalize the problem in standard mathematical language. I don't even understand the standard mathematical language used. The observations and algorithms below were derived from tinkering and intuition.

    Since I wrote most of this, I found that these problems have been worked on, and many of the simpler cases fully characterized. The one-dimensional cases are called De Bruijn sequences. and the two-dimensional cases De Bruijn tori:

De Bruijn sequence, Wikipedia
De Bruijn graph, Wikipedia
de Bruijn Sequence, MathWorld

On the de Bruijn Torus Problem, Glenn Hulbert, Garth Isaak

1-D 2-symbol, every triplet, link to 2-D 2-symbol, every 2x2, link to Corners tilings, link to 
1-D 2-symbol, every pair
1-D 2-symbol, every triplet
1-D 2-symbol, every quadruplet
1-D 2-symbol, every quintuplet
2-D 2-symbol, every 2x2
2-D 4-symbol, every 2x2
2-D 2-symbol, every 3x3
Corners tilings


    This problem came up when designing motifs for aperiodic tesselations, for example Thue-Morse patterns. In a 1-D 2-symbol Thue-Morse sequence all pairs of symbols occur ( aa, ab, ba, bb ) but no triplets (of symbols or sets of sequential symbols). Extended to 2-D and more symbols, aperiodic sequences can have sub-arrays of symbols that occur infrequently. [To Do: 4-symbol cyclic system example.]  It would be nice to be able to draw all possible neighborhoods (sets of adjacent motifs) with no repetitions.

    It is convenient to consider the symbols as sequential integers starting at 0 because the possible pairs can be enumerated as all possible integers with a base of the number of symbols.

[To Do: Draw corresponding directed graphs for all 1-D cases.]

1-D 2-symbol, every pair

    There are four unique pairs of two symbols. Using {0,1} as symbols they correspond to the binary (base 2) numbers between 0 and 3:

{ 00, 01, 10, 11 }          


 A bit of trial and error yields a string with all adjacent pairs and no duplications:
[ 0 0 1 1 0 ]

The mirror and complements of this string also are solutions:
[ 0 1 1 0 0 ]
[ 1 0 0 1 1 ]
[ 1 1 0 0 1 ]

Note that the first symbol is the same as the last, so they can wrap around as a length four ring. The solutions are all cyclic permutation of a single ring:
[ ... 0 0 1 1 ... ] 2-symbol doublet necklace
The graphical representation (right) uses black and white to represent 0 and 1 respectively, and is read clock-wise from the top.

1-D 2-symbol, every triplet

     There are eight unique triplets of two symbols. Using {0,1} as symbols they correspond to the binary (base 2) numbers between 0 and 7:
       
{ 000, 001, 010, 011, 100, 101, 110, 111 }

     A solution can be found by starting with [ 0 0 0 ], and iteratively add a symbol to the right with these conditions:

(a) Choose the symbol that will form a triple that is not yet in the string
(b) If both symbols can be added to result in a new triplet, use the one that will most closely balance the number of each symbol.
(c) If the number of symbols is balanced, arbitrarily select the next symbol as 0.

Here's the sequence of adding to the string, with the appropriate conditions:
[ 0 0 0                ]  
[ 0 0 0 1              ]  (a)
[ 0 0 0 1 1            ]  (b)
[ 0 0 0 1 1 1          ]  (b)
[ 0 0 0 1 1 1 0        ]    (a), because 111 is already in the string
[ 0 0 0 1 1 1 0 1      ]    (b), because both 100 and 101 aren't in the string, but there are more 0's than 1's
[ 0 0 0 1 1 1 0 1 0    ]  (a) Note that the last symbol is the first symbol wrapped around.
[ 0 0 0 1 1 1 0 1 0 0  ]  (a)Note that the last two symbols are the first symbols wrapped around.

Condition (c) is not used here, but will be for longer sub-string solutions.

    Does this algorithm result in a solution for all 2-symbol strings, of any length? Is there a generalization for more than 2 symbol strings? Does it matter what triplet is used as the initial state?

    Similar to the "every pair" solution, the first two symbols are the same as the last, so all triplets occur in the eight term ring:
[ ... 0 0 0 1 1 1 0 1 ...  ] 2-symbol triplet necklace

and in any cyclic permutation. This and the mirror/complement are the only solutions.

    The mirror or complement (cyclic permutation of symbols) of this ring is another unique solution -- the mirror solution is the same as the complement. Considered as a 3-D necklace (allowing a rotation about an axis in the plane), there is only a single solution.

    Ted demonstrated a nice method of finding solution to some of these problems by construction of a bilaterally symmetric directed graph. Solutions are found by finding a Hamiltonian cycle on the graph. The general problem of finding Hamiltonian cycles on graphs is well studied, with a complexity class of NP-complete in FNP. While this particular graph is planar, the graphs for longer sub-strings or more symbols are not. But any graph  for similar problems has symmetries that might reduce the problem to polynomial time. There is a reflection symmetry about the vertical centerline, both in the graph layout and in the sub-string triplets, and there is an anti-symmetric reflection symmetry about the horizontal centerline.

2-symbol triplet graph 2-symbol triplet Hamiltonian cycle 1 2-symbol triplet Hamiltonian cycle 2


    This directed graph can be mapped onto the corners of a cube, and a solution will visits all corners of the cube on a restricted set of directed edges:
directed graph on a cube, 2-symbol every triplet Hamiltonian cycle, 2-symbol every triplet Hamiltonian cycle, 2-symbol every triplet
    Diagram of all triplets as corner coordinates (3-tuples) of a unit cube, and allowed neighbors as a graph on the cube (left). There are two solutions, corresponding to two Hamiltonian cycles on this graph (center and right)..
    Treated as 3-dimensional graphs, the two solutions are related by a parity transformation, equivalent to a mirroring and a rotation by pi radians. Thus they are achiral. This is equivalent to the complement, or cyclic permutation of the symbols 0 and 1.

    There is no Eulerian directed cycle on the full graph (left), because not all edges can be traversed in a cycle. In particular (0,1,1 to  1,1,0) and (1,0,0 to 0,0,1) can't be an edge of a cycle (blue crosses in left figure). These correspond to the length 4 substrings (0 1 1 0) and (1 0 0 1) that don't (can't) appear in the solutions.

    This graph on a cube is related to Gray codes, in that they also form a Hamiltonian cycle on the corners of an n-cube (where each bit is seen as one dimension). But in the case of Gray codes the graph is undirected and the edges are all edges of the cube (no diagonals).

1-D 2-symbol, every quadruplet

     Using the algorithm described in "every triplet", all sixteen possible quadruplets:
       
{ 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111 }

appear in this string:

[ 0 0 0 0 1 1 1 1 0 1 0 1 1 0 0 1 0 0 0 ]

The sixteen term ring is:

[ ... 0 0 0 0 1 1 1 1 0 1 0 1 1 0 0 1 ...  ]

    Is this is a unique minimal solution, up to cyclic permutations and mirroring? No! Ted Bell found another:

[ ... 0 0 0 0 1 0 0 1 1 1 1 0 1 0 1 1 ...  ]

Ted found this as a Hamiltonian cycle on a bilaterally symmetric directed graph. [To Do: Draw directed graph.]

    By brute force search, I found there are 16 solution, 4 unique solutions up to mirroring and cyclic permutation of the two symbols. [To Do: List and post program.]

1-D 2-symbol, every quintuplet

     Using the same procedures as for the "every triplet" string:
       
[ 0 0 0 0 0 1 1 1 1 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 0 0 0 1 0 0 0 0 ]

The 32-term ring is:

[ ... 0 0 0 0 0 1 1 1 1 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 0 0 0 1 ...  ]

Is this is a unique minimal solution, up to cyclic permutations and mirroring? This is even more difficult to prove, but still tractable by brute force.
No, there are 2048 solutions, 512 solutions up to mirroring and cyclic permutation of the two symbols.


1-D 4-symbol, every triplet


Ted Bell sent me this solution, which he found by visually searching it's graph for a Hamiltonian cycle:
... 0001110100202221211220030333232233131133013203210310231201230213 ...

This is way too long for my brute force (with principled lowest estimate) programming approach.

    The mother of one of my programming kids taught me an interesting trick: once you've found one solution there are ways of finding more solutions, and perhaps all solutions. The basic transform for this one goes like:
- all pairs of symbols happen several times. For example 00
0001110100202221211220030333232233131133013203210310231201230213
- the sequences between these pairs (for example 11101 and 20222121122) can be swapped giving
0002022212112200111010030333232233131133013203210310231201230213
I think this works with any 1-D solution, and I think you can count the number of solutions this way. But I don't know how.

2-D 2-symbol, every 2x2

     There are sixteen possible 2x2 sub-arrays with two symbols. Using {0, 1} as symbols they correspond to the binary (base 2) numbers between 0 and 15:
{ [ 0 0    [ 0 0    [ 0 0    [ 0 0    [ 0 1    [ 0 1    [ 0 1    [ 0 1
    0 0 ]
,   0 1 ],   1 0 ],   1 1 ],   0 0 ],   0 1 ],   1 0 ],   1 1 ],

  [ 1 0    [ 1 0    [ 1 0    [ 1 0    [ 1 1    [ 1 1    [ 1 1    [ 1 1
    0 0 ],   0 1 ],   1 0 ],   1 1 ],   0 0 ],   0 1 ],   1 0 ],   1 1 ], }


I gave up on using a tesselation on a torus (wrap-around in both directions), because it appeared to be ugly, requiring many duplicates.
7/1/08 But Ted Bell didn't, and showed that there aren't duplicates:
Brigid's cross (octomino) is a torroidal solution.  I was fussing with the big hierarchical set of solutions last night, I got too frustrated and decided to count the number of 2x2 squares in a torroidal matrix of size 4.
It's 16, so I thought I'd try to fit the 16 possible 2x2 2 symbol squares into it. Only a symmetrical solution would do and Brigid's cross is one of them, I think the others are all row and column swaps.
0 1 0 0
0 1 1 1
1 1 1 0
0 0 1 0

Brigid's octomino array
Put the right column and bottom row on top of the originals (to show all sub-arrays when tesselated on a torus):

0| 0 1 0 0
___________
0| 0 0 1 0
0| 1 1 1 0
1| 0 1 1 1
0| 0 1 0 0
Brigids octomino tesselation

This array's tiling is symmetric and periodic with respect the symbols 0/1, or black/white.


      


    The tesseract's adjacency diagram (which faces are mutually adjacent) can be mapped onto a 4x4 toroidal grid. So the tesseract faces can be colored using this pattern such that each of the 16 vertices (which are all at intersections of four faces) has a uniquely colored neighborhood (counting each of four orthogonal orientations as different).
[To Do: A stereo projection of a tesseract (similar to this) with labelled vertices and a corresponding 4x4 adjacency grid (see bottom of this). Try illustrating the coloring of faces.]
[To Do: Construct a single projected "corner" of a tesseract, composed of four interpenetrating parallelpipeds. This corresponds to a cube's single corner projected onto a plane, composed of three diamonds.]

    To construct a "non-wrapping" array that contains all sub-arrays, a 5x5 array seems natural because it has exactly 16 2x2 sub-arrays. It has 25 elements so must include an unequal number of the two symbols -- it didn't seem possible since the set of 2x2's is equivalent with a swap of symbols so there must be an equal number of symbols. But there is another solution:

[ 0 0 0 0 1
  0 0 1 0 1
  1 1 1 1 0
  1 1 0 1 0
  0 0 0 0 1
]

There are 14-0's and 11-1's, but the central elements (with more 1's) appear in more sub-arrays that the side and corner elements (with more 0's). Are there other 5x5 solutions?
 
    All columns, but no rows, are 1-D 2-symbol solutions. The top and bottoms rows are the same, so the 5x4 array (without the top or bottom row) tesselates a cylinder. The left and right columns are complements, so it and its complement forms an 8x4 tesselation on a torus that has a uniform distribution of all 2x2 elements:
 0 0 0 0 1 1 1 1
 0 0 1 0 1 1 0 1
 1 1 1 1 0 0 0 0
 1 1 0 1 0 0 1 0

765 octomino tile
The 8x4 tile.
765 octomino tesselation
Tesselation of the tile.
 


    Any 5x5 subset of these tesselations will be a minimal solution (contain all 2x2 sub-arrays), and any 4x8 subset of the latter will tesselate in the same way.

    The periodicity of the tesselations is curious. Both the rows and the columns of the first are periodic on the string 0 1 1 1 or its complement 1 0 0 0 (and cyclic permutations). The columns are all periodic on 0 0 1 1 and its complement, but the rows are periodic on either 0 0 0 0 1 1 1 1 or 0 0 1 0 1 1 0 1 or their complements (and cyclic permutations).

    It turns out that there are offset tesselations of  these 4x8 arrays that also have an equal distribution of 2x2 sub-arrays. For example:

 1 0 0 1
 0 0 1 1
 0 0 1 1
 1 1 0 0
 0 1 1 0
 1 1 0 0
 1 1 0 0
 0 0 1 1

4x8 tile pair

A subset of the tesselation (above) rotated by pi/2 radians, and its pi rotation.
4x8 offset tesselation
An offset tesselation of the two tiles, with horizontal rows of the same tile. By matching the 2x2 sub-arrays that occur on horizontal boundaries, the equal frequency of sub-arrays is preserved.
[To Do: Make a rectangular region that periodically tiles the plane.]
 motif outline
A tile constructed by symbol replacement with a double diagonal motif:
symbol replacement with double-diagonal motif

(Rotated extension not shown.)
Kinky frieze, link to
A frieze that based on this tiling and symbol replacement, followed by coloring contiguous regions.
Pattern and process, link to
A fractal elaboration of the frieze. [Broken link: synthetic ripples - ripple distortion description and code.]


[To Do: L_system_tiling( 'test', 1, 2, 1, 0,'zig_zag_eel_motif_5.png', '2x2_2s_a.png', 0, 1 ); ]

Of course it works with a swap of the symbols, or with rotations/mirrors of any 5x5 subset of the tesselations. Are there other solutions? There are only so many 5x5 arrays and I tried just about every one that contained [ 0 0; 0 0 ] and [ 1 1; 1 1 ].

2-D 4-symbol, every 2x2

    There are 256 (43) possible 2x2's, and a minimal toroidal solution would fit on a 16x16 grid. Ted and I couldn't do this one. Can a brute force search find a solution?

    We can imagine a simple brute force search algorithm that requires somewhat less than 231 checks to find (or not find) a solution. This is too large to be computationally tractable? How much less to find a solution? What is a good algorithm, and will it find all solutions?

    There are several good reasons for having a positive solution. The most basic is to make figures like this Truchet-like tiling that show all combinations of 2x2 neighbors with no repeats.

2-D 2-symbol, every 3x3

    There are 512 (29) 2-symbol 3x3 arrays. There is no minimal square solution that tesselates a torus (because 512 is not the square of an integer), but there may be 24 x 25 (16 x 32), 23 x 26 (8 x 64), or 22 x 27 (4 x 128) solutions.

       I don't know any solutions for this, or have any ideas for methods of constructing one -- except for brute force search of all combinations. But the brute force search space is far too large.

2-D 2-symbol, Thue-Morse 2x2s

    Is there a toroidal array that contains all 2x2's that can occur in a Thue-Morse pattern, with no repeats? All L-arrangements don't occur, all others do.

Corners tilings

 [To Do:   These [will be] examples of a combinatorial system with a second neighborhood constraint. ]

Brick wall corners, link to

The pattern is formed from these rotations and inversions of a single "corner motif":
corner motifs, labelled

The left half of the pattern is this arrangement:

corner 5x5 array

    This pattern has a nice set of symmetries and anti-symmetries [ To Do: Describe in B/W matrix.]. But it doesn't include the fifth set [ To Do: Show ].
Is there an square, or nearly square matrix of motifs that contains all 2x2 patterns in an arrangement where the white and black (a's and b's) have the same relationship?


References

Automorphisms of subword-posets

P. Ligeti and P. Sziklai
Department of Computer Science, Eotvos University, Budapest, Pazmany P.s.1c, H1117 Budapest, Hungary  

Abstract

In this paper the automorphism group of two posets, Dk,n and Bm,n is determined. Dk,n is the poset of DNA strands of length at most n, built up with k complement pairs of letters, and partially ordered by the subsequence relation. Bm,n is the set of all subsequences of the word um,n=a1an defined over the alphabet {0,1,…,(m-1)}, where Click to view the MathML source. The automorphism group of Bm,n was known already (see G. Burosch, H.-D.O.F. Gronau, J.-M. Laborde, The automorphism group of the subsequence poset Bm,n, Order 16 (2) (1999) 179–194 (2000)), here a short proof is presented as an illustration of the method used in the first part.

Keywords: Subword; Poset; Automorphism


Discrete Mathematics
Volume 305, Issues 1-3, 6 December 2005, Pages 372-378




A038219     A maximally unpredictable sequence.
The sequence starts 0,1,0 and continues according to the following rule: find the longest sequence at the end that has occurred at least once previously. If there are more than one previous occurrences select the last one. The next digit of the sequence is the opposite of the one following the previous occurrence.

A079101       A repetition-resistant sequence.

a(n) = 0 or 1, chosen so as to maximize the number of different subsequences that are formed.

a(n+1)=1 if and only if (a(1),a(2),...,a(n),0), but not (a(1),a(2),...,a(n),1), has greater length of longest repeated segment than (a(1),a(2),...,a(n)) has.

In Feb, 2003, Alejandro Dau solved Problem 3 on the Unsolved Problems and Rewards website, thus establishing that every binary word occurs infinitely many times in this sequence.

Klaus Sutmer remarks (Jun 26 2006) that this sequence is very similar to the Ehrenfeucht-Mycielski sequence A007061. Both sequences have every finite binary word as a factor; in fact, essentially the same proof works for both sequences.