skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: DNA codes

Conference ·
OSTI ID:975129

We have begun to characterize a variety of codes, motivated by potential implementation as (quaternary) DNA n-sequences, with letters denoted A, C The first codes we studied are the most reminiscent of conventional group codes. For these codes, Hamming similarity was generalized so that the score for matched letters takes more than one value, depending upon which letters are matched [2]. These codes consist of n-sequences satisfying an upper bound on the similarities, summed over the letter positions, of distinct codewords. We chose similarity 2 for matches of letters A and T and 3 for matches of the letters C and G, providing a rough approximation to double-strand bond energies in DNA. An inherent novelty of DNA codes is 'reverse complementation'. The latter may be defined, as follows, not only for alphabets of size four, but, more generally, for any even-size alphabet. All that is required is a matching of the letters of the alphabet: a partition into pairs. Then, the reverse complement of a codeword is obtained by reversing the order of its letters and replacing each letter by its match. For DNA, the matching is AT/CG because these are the Watson-Crick bonding pairs. Reversal arises because two DNA sequences form a double strand with opposite relative orientations. Thus, as will be described in detail, because in vitro decoding involves the formation of double-stranded DNA from two codewords, it is reasonable to assume - for universal applicability - that the reverse complement of any codeword is also a codeword. In particular, self-reverse complementary codewords are expressly forbidden in reverse-complement codes. Thus, an appropriate distance between all pairs of codewords must, when large, effectively prohibit binding between the respective codewords: to form a double strand. Only reverse-complement pairs of codewords should be able to bind. For most applications, a DNA code is to be bi-partitioned, such that the reverse-complementary pairs are separated across the two blocks. For the foregoing reasons, these two blocks of codewords suffice as the hooks and loops of a digital Velcro. We began our investigations of such codes by constructing quaternary BCH reverse-complement codes, using cyclic codes and conventional Hamming distance [4]. We also obtained upper and lower bounds on the rate of reverse-complement codes with a metric function based on the foregoing similarities [3]. For most applications involving DNA, however, the reverse-complementary analogue of codes based on the insertion-deletion distance is more advantageous. This distance equals the codeword length minus the longest length of a common (not necessarily contiguous) subsequence. (The 'aligned' codes described above may be used under special experimental conditions), The advantage arises because, under the assumption that DNA is very flexible, the sharing of sufficiently long subsequences between codewords would be tantamount to the ability of one of their reverse complements to form a double strand with the other codeword. Thus far, using the random coding method, we have derived an asymptotic lower bound on the rate of reverse-complement insertion-deletion codes, as a function of the insertion-deletion distance fraction and of the alphabet size [1]. For the quaternary DNA alphabet of primary importance, this lower bound yields an asymptotically positive rate if the insertion-deletion-distance fraction does not exceed the threshold {approx} 0.19. Extensions of the Varsamov-Tenengol'ts construction of insertion-deletion codes [5] for reverse-complement insertion-deletion codes will be described. Experiments have been performed involving some of our DNA codes.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE
OSTI ID:
975129
Report Number(s):
LA-UR-01-0626; LA-UR-01-626; TRN: US201008%%91
Resource Relation:
Conference: "Submitted to: SETA 2001 Conference, Bergen, Norway, May 13-17, 2001"
Country of Publication:
United States
Language:
English

Similar Records

Deep functional analysis of synII, a 770-kilobase synthetic yeast chromosome
Journal Article · Fri Mar 10 00:00:00 EST 2017 · Science · OSTI ID:975129

Final Technical Report
Technical Report · Mon Jul 07 00:00:00 EDT 2008 · OSTI ID:975129

Mutations in the PCCA gene encoding the {alpha} subunit of propionyl-CoA carboxylase in patients with propionic acidemia
Journal Article · Thu Sep 01 00:00:00 EDT 1994 · American Journal of Human Genetics · OSTI ID:975129