Mark
Dow
Geek
art
Minimal
arrays
containing all sub-array combinations of symbols: DeBruijn sequences
and tori
For
n symbols in
m groups, what are the smallest
arrays that contain all combinations of adjacent elements?
This is a
symbolic
combinatorics question. It shares some characteristics with the
necklace
problem.
I know very little about this topic, and have not attempted to
formalize the problem in standard mathematical language. I don't even
understand the standard mathematical language used. The
observations and algorithms below were derived from tinkering and
intuition.
Since I wrote most of this, I found that these
problems have been worked on, and many of the simpler cases fully
characterized. The one-dimensional cases are called De Bruijn
sequences. and the two-dimensional cases De Bruijn tori:
This problem came up
when designing motifs for aperiodic tesselations,
for example
Thue-Morse
patterns. In a 1-D 2-symbol Thue-Morse sequence
all pairs of symbols occur (
aa,
ab,
ba,
bb
) but no triplets (of symbols or sets of sequential symbols). Extended
to 2-D and more symbols, aperiodic sequences can have sub-arrays of
symbols that occur infrequently. [To Do: 4-symbol cyclic system
example.] It would be nice to be able to draw all possible
neighborhoods (sets of adjacent motifs) with no repetitions.
It is convenient to consider the symbols as
sequential integers starting at
0
because the possible pairs can be
enumerated as all possible integers with a base of the number of
symbols.
[To Do: Draw corresponding directed graphs for all 1-D cases.]
1-D 2-symbol, every pair
There are four unique pairs of two symbols.
Using {0,1}
as symbols they correspond to the binary (base 2) numbers between 0 and
3:
{ 00, 01, 10, 11 }
A bit of trial and error yields a string with all adjacent pairs
and no duplications:
[ 0 0 1 1 0 ]
The mirror and complements of this string also are solutions:
[ 0 1 1 0 0 ]
[ 1 0 0 1 1 ]
[ 1 1 0 0 1 ]
Note that the first symbol is the same as the last, so they can wrap
around as a length four ring. The solutions
are all cyclic permutation of a single ring:
| [ ... 0 0 1 1 ... ] |
 |
The graphical representation (right) uses black and white to represent
0 and 1 respectively, and is read clock-wise from the top.
1-D 2-symbol, every triplet
There are eight unique triplets of two
symbols. Using {0,1}
as symbols they correspond to the binary (base 2) numbers between 0 and
7:
{ 000, 001, 010, 011, 100,
101, 110, 111 }
A solution can be found by starting with [ 0 0 0
], and iteratively add a symbol to the right with these conditions:
(a) Choose the symbol that will form a triple that is not yet in the
string
(b) If both symbols can be added to result in a new triplet, use the
one that will most closely balance the number of each symbol.
(c) If the number of symbols is
balanced, arbitrarily select the next symbol as 0.
Here's the sequence of adding to the string, with the appropriate
conditions:
[ 0 0 0
]
[ 0 0 0 1
] (a)
[ 0 0 0 1 1
] (b)
[ 0 0 0 1 1
1
] (b)
[ 0 0 0 1 1 1
0 ] (a), because 111 is
already in the string
[ 0 0 0 1 1 1 0
1 ] (b), because both
100 and
101
aren't
in the string, but there are more 0's than 1's
[ 0 0 0 1 1 1 0 1 0
] (a) Note that the last symbol is the
first symbol wrapped around.
[ 0 0 0 1 1 1 0 1 0
0 ] (a)
Note that
the last two symbols are the first symbols wrapped around.
Condition (c) is not used here, but will be for longer sub-string
solutions.
Does this algorithm result in a solution for all
2-symbol strings,
of any length? Is there a generalization for more than 2 symbol
strings? Does it matter what triplet is used as the initial state?
Similar to the "every
pair" solution, the first two
symbols are the
same as the last, so all triplets occur in the eight term
ring:
| [ ... 0 0 0 1 1 1 0
1 ... ] |
 |
and in any cyclic permutation. This and the mirror/complement are the
only solutions.
The mirror or
complement
(cyclic permutation of symbols) of this
ring is another unique solution -- the mirror solution is the same as
the
complement. Considered as a 3-D necklace (allowing a rotation about an
axis in the plane), there is only a single solution.
Ted demonstrated a nice method of finding solution
to some of these problems by
construction of a bilaterally symmetric directed graph. Solutions are
found by finding a
Hamiltonian cycle
on the graph. The general problem of finding Hamiltonian cycles on
graphs is well studied, with a complexity class
of NP-complete in FNP.
While this particular graph is planar, the
graphs for longer sub-strings or more symbols are not. But any
graph for similar problems has symmetries that might reduce the
problem to polynomial
time. There is a reflection symmetry about the
vertical centerline, both in the graph layout and in the sub-string
triplets, and there is an anti-symmetric reflection symmetry about the
horizontal centerline.
This directed graph can be mapped onto the corners
of a cube, and a solution will visits all
corners of the cube on a restricted set of directed edges:
Diagram of all
triplets as corner coordinates (3-tuples) of
a unit cube, and allowed neighbors as a graph on the cube (left). There
are
two
solutions, corresponding to two
Hamiltonian cycles
on this graph (center and right)..
Treated as 3-dimensional graphs, the two
solutions
are related by a
parity
transformation, equivalent to a mirroring and a rotation by pi
radians. Thus they are
achiral.
This is equivalent to the complement, or cyclic permutation of the
symbols 0 and 1.
There is no Eulerian directed
cycle on the full graph (left), because not all edges can be
traversed in a
cycle. In particular (0,1,1 to 1,1,0) and (1,0,0 to 0,0,1) can't
be an edge of a cycle (blue crosses in left figure). These correspond
to the length 4 substrings (0 1
1 0) and (1 0 0 1) that don't (can't) appear in the solutions.
This graph on a cube is related to Gray codes, in
that they also form a Hamiltonian cycle on the
corners of an n-cube (where each bit is seen as one dimension). But in
the case of Gray codes the graph is
undirected and the edges are all edges of the cube (no diagonals).
1-D 2-symbol, every
quadruplet
Using the algorithm described in "every
triplet", all sixteen possible quadruplets:
{ 0000, 0001, 0010, 0011,
0100, 0101, 0110, 0111, 1000,
1001, 1010, 1011, 1100, 1101, 1110, 1111 }
appear in this string:
[ 0 0 0 0 1 1 1 1 0
1 0 1 1 0 0 1 0 0 0 ]
The sixteen term ring is:
[ ... 0 0 0 0 1 1 1 1 0 1
0 1 1 0 0 1 ... ]
Is this is a unique minimal solution, up to cyclic
permutations and
mirroring? No! Ted Bell found another:
[ ... 0 0 0 0 1 0 0 1 1 1
1 0 1 0 1 1 ... ]
Ted found this as a Hamiltonian cycle
on a bilaterally symmetric directed graph. [To
Do: Draw directed graph.]
By brute force search, I found there are 16
solution, 4 unique solutions up to mirroring and cyclic permutation of
the two symbols. [To Do: List and post program.]
1-D 2-symbol, every
quintuplet
Using the same procedures as for the "every
triplet" string:
[ 0 0 0 0 0 1 1 1 1
1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 0 0 0 1 0 0 0 0 ]
The 32-term ring is:
[ ... 0 0 0 0 0 1 1 1 1 1
0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 0 0 0 1 ... ]
Is
this is a unique minimal solution, up to cyclic permutations and
mirroring? This is even more difficult to prove, but still tractable by
brute force.
No, there are 2048 solutions, 512 solutions up to mirroring and cyclic
permutation of the two symbols.
1-D 4-symbol, every triplet
Ted Bell sent me this solution, which he found by visually searching
it's graph for a Hamiltonian cycle:
... 0001110100202221211220030333232233131133013203210310231201230213 ...
This is way too long for my brute force (with principled lowest
estimate) programming approach.
The mother of one of my programming kids
taught me an interesting
trick: once you've found one solution there are ways of finding more
solutions, and perhaps all solutions. The basic transform for this one
goes like:
- all pairs of symbols happen several times. For example 00
0001110100202221211220030333232233131133013203210310231201230213
- the sequences between these pairs (for example 11101 and 20222121122)
can be swapped giving
0002022212112200111010030333232233131133013203210310231201230213
I think this works with any 1-D solution, and I think you can count the
number of solutions this way. But I don't
know how.
2-D 2-symbol, every 2x2
There are sixteen possible 2x2 sub-arrays with
two symbols. Using {0, 1} as
symbols they correspond to the binary (base 2) numbers between 0 and 15:
{ [ 0 0 [
0 0 [
0 0 [
0 0 [
0 1 [
0 1 [
0 1 [
0 1
0 0 ],
0 1 ],
1 0 ],
1 1 ],
0 0 ],
0 1 ],
1 0 ],
1 1 ],
[ 1 0 [
1 0 [
1 0 [
1 0 [
1 1 [
1 1 [
1 1 [
1 1
0 0 ],
0 1 ],
1 0 ],
1 1 ],
0 0 ],
0 1 ],
1 0 ],
1 1 ], }
I gave up on using a tesselation on a torus (wrap-around in both
directions), because it appeared to be ugly, requiring many duplicates.
7/1/08 But Ted Bell didn't, and showed that there aren't duplicates:
Brigid's cross (octomino) is a torroidal solution. I was fussing with the big hierarchical set of solutions last night, I got too frustrated and decided to count the number of 2x2 squares in a torroidal matrix of size 4.
It's 16, so I thought I'd try to fit the 16 possible 2x2 2 symbol squares into it. Only a symmetrical solution would do and Brigid's cross is one of them, I think the others are all row and column swaps.
0 1 0 0 0 1 1 1 1 1 1 0 0 0 1 0

|
Put the right column and bottom row on top of the originals (to show all sub-arrays when tesselated on a torus):
0| 0 1 0 0 ___________ 0| 0 0 1 0 0| 1 1 1 0 1| 0 1 1 1 0| 0 1 0 0
|

This array's tiling is symmetric and periodic with respect the
symbols 0/1, or black/white.
|
The tesseract's
adjacency diagram (which faces are mutually
adjacent) can be mapped onto a 4x4 toroidal grid. So the tesseract
faces can be colored using this pattern such that
each of the 16 vertices (which are all at intersections of four faces)
has
a uniquely colored neighborhood (counting each of four orthogonal
orientations as different).
[To Do: A stereo projection of a tesseract (similar to this) with
labelled vertices and a corresponding 4x4 adjacency grid (see bottom of
this). Try
illustrating the coloring of faces.]
[To Do: Construct a single projected "corner" of a tesseract, composed
of four interpenetrating parallelpipeds. This corresponds to a cube's
single corner projected onto a plane, composed of three diamonds.]
To construct a "non-wrapping" array that contains
all sub-arrays, a 5x5
array seems natural because it has exactly 16 2x2 sub-arrays. It
has 25 elements so must include an unequal number of the two symbols --
it didn't seem possible since the set of 2x2's is equivalent with a
swap of symbols so there must be an equal number of symbols. But there
is another solution:
[ 0
0 0 0 1
0
0 1 0 1
1 1 1 1
0
1
1 0 1 0
0 0 0 0 1 ]
There are 14-0's and 11-1's, but the
central elements (with more 1's) appear
in more sub-arrays that the side and corner elements (with more 0's). Are
there other 5x5 solutions?
All columns, but no rows, are 1-D 2-symbol solutions. The top
and bottoms rows are the same, so the 5x4 array (without the top or
bottom row) tesselates a
cylinder. The left and right columns are complements, so it and its
complement forms an 8x4 tesselation on a torus that has a uniform
distribution of all 2x2 elements:
0
0 0 0 1 1 1 1
0
0 1 0 1 1 0 1
1
1 1 1 0 0 0 0
1
1 0 1 0 0 1 0

The
8x4 tile.
|

Tesselation of the tile. |
|
|
Any 5x5 subset of these tesselations will be a
minimal solution (contain all 2x2 sub-arrays), and any 4x8 subset of
the latter will tesselate in the same way.
The periodicity of the tesselations is curious. Both
the rows and the columns of the first are periodic on the string 0 1 1
1 or its complement 1 0 0 0 (and cyclic permutations). The columns are
all periodic on 0 0 1 1 and its complement, but the rows
are periodic on either 0 0 0 0 1 1 1 1 or 0 0 1 0 1 1 0 1 or their
complements (and cyclic permutations).
It turns out that there are offset tesselations of
these 4x8 arrays that also have an equal distribution of 2x2
sub-arrays. For example:
1
0 0 1
0
0 1 1
0
0 1 1
1
1 0 0
0 1 1 0
1
1 0 0
1
1 0 0
0
0 1 1

A subset of the tesselation (above) rotated by pi/2 radians, and
its pi rotation.
|

An offset tesselation of the two tiles, with horizontal rows of the
same tile. By matching the 2x2 sub-arrays that occur on horizontal
boundaries, the equal frequency of sub-arrays is preserved.
[To Do: Make a rectangular region that periodically tiles the plane.]
|

A tile constructed by symbol replacement with a double diagonal motif:

(Rotated extension not shown.)
|

A frieze that based on this tiling and symbol replacement, followed by
coloring contiguous regions. |

A fractal elaboration of the frieze. [Broken link: synthetic ripples -
ripple distortion description and code.] |
[To Do: L_system_tiling( 'test', 1, 2, 1, 0,'zig_zag_eel_motif_5.png',
'2x2_2s_a.png', 0, 1 ); ]
Of course it works with a swap of the symbols, or with
rotations/mirrors of any 5x5 subset of the tesselations. Are there
other solutions? There are only so many 5x5 arrays and I tried just
about
every one that contained [ 0 0; 0 0 ] and
[ 1 1; 1 1 ].
2-D 4-symbol, every 2x2
There are 256 (43) possible 2x2's, and a
minimal toroidal solution
would fit on a 16x16 grid. Ted and I couldn't do this one. Can a brute
force
search find a solution?
We can imagine a simple brute force search algorithm
that requires somewhat less than 231 checks to find (or not
find) a solution. This is too large to be computationally tractable?
How much less to find a solution? What is a good algorithm, and will it
find all solutions?
There are several good reasons for having a positive
solution. The most basic is to make figures like this Truchet-like
tiling that show all combinations of 2x2 neighbors with no repeats.
2-D 2-symbol, every 3x3
There are 512 (29) 2-symbol 3x3 arrays.
There is no minimal square solution that tesselates a torus (because
512 is not the square of an integer), but there may be 24 x 25 (16
x 32), 23 x 26 (8 x 64), or 22 x 27
(4 x 128) solutions.
I don't know any solutions for this, or
have any ideas for methods of constructing one -- except for brute
force search of all combinations. But the brute force search space is
far too large.
2-D 2-symbol, Thue-Morse 2x2s
Is there a toroidal array that contains all 2x2's
that can occur in a Thue-Morse
pattern, with no repeats? All L-arrangements don't occur, all
others do.
Corners tilings
[To Do: These [will be] examples of a combinatorial system
with a second neighborhood constraint. ]
The pattern is formed from these rotations and inversions of a single
"corner motif":
The left half of the pattern is this arrangement:
This pattern has a nice set of symmetries and
anti-symmetries [ To Do:
Describe in B/W matrix.]. But it doesn't include the fifth set [ To Do:
Show ].
Is there an square, or nearly square matrix of motifs that contains all
2x2 patterns in an arrangement where the white and black (a's and b's)
have the same relationship?
References
P. Ligeti and P. Sziklai
Department of Computer Science, Eotvos University, Budapest, Pazmany
P.s.1c, H1117 Budapest, Hungary
Abstract
In this paper the automorphism group of two posets, Dk,n
and Bm,n
is determined. Dk,n
is the poset of DNA strands of length at most n, built up with k
complement pairs of letters, and partially ordered by the subsequence
relation. Bm,n
is the set of all subsequences of the word um,n=a1…an
defined over the alphabet {0,1,…,(m-1)},
where
. The automorphism group of Bm,n
was known already (see G. Burosch, H.-D.O.F. Gronau, J.-M. Laborde, The
automorphism group of the subsequence poset Bm,n,
Order 16 (2) (1999) 179–194 (2000)), here a short proof is
presented as an illustration of the method used in the first part.
Keywords: Subword; Poset; Automorphism
A038219
A maximally unpredictable sequence.
The sequence starts 0,1,0 and
continues according to the following
rule: find the longest sequence at the end that has occurred at least
once previously. If there are more than one previous occurrences select
the last one. The next digit of the sequence is the opposite of the one
following the previous occurrence.
A079101
A repetition-resistant sequence.
a(n)
= 0 or 1, chosen so as to maximize the number of different
subsequences that are formed.
a(n+1)=1
if and only if (a(1),a(2),...,a(n),0), but not
(a(1),a(2),...,a(n),1), has greater length of longest repeated segment
than (a(1),a(2),...,a(n)) has.
In
Feb, 2003, Alejandro Dau solved Problem 3 on the Unsolved Problems
and Rewards website, thus establishing that every binary word occurs
infinitely many times in this sequence.
Klaus Sutmer remarks (Jun 26 2006) that this sequence is very similar
to the Ehrenfeucht-Mycielski sequence A007061. Both
sequences have every finite binary word as a factor; in fact,
essentially the same proof works for both sequences.