Breadcrumb

Thomas D. Schneider, Ph.D.

Thomas D. Schneider, Ph.D.

  • Center for Cancer Research
  • National Cancer Institute
Senior Research Assistant
RNA Biology Laboratory

RESEARCH SUMMARY

Dr. Schneider is interested in discovering and exploring the fundamental mathematics of biology: "Living things are too beautiful for there not to be a mathematics that describes them." He uses the mathematics of information theory, first developed by Claude Shannon in 1948.

Dr. Schneider first discovered that binding sites on nucleic acids usually contain just about the amount of information needed for molecules to find the sites in the genome. Information is measured in bits, the choice between two equally likely possibilities. It is the number of times one needs to divide the possibilities to reach a subset of objects. That is, the log base 2 of the number of posibilities is the number of bits. For example, ribosome binding sites in E. coli have about 10 bits of information per site on the average. To find the roughly 4000 gene starts in the 4 million base E. coli requires about log2(4,000,000/4,000) = 10 bits, close to the information measured in the ribosome binding sites.

Schneider and then high-school student Mike Stephens invented sequence logos to understand the patterns at donor and acceptor human RNA splice junctions. Sequence logos are now widely used in molecular biology.

The relationship between information, measured in bits, and the binding energy is a fundamental problem in biology. The Second Law of Thermodynamics gives the ideal relationship for converting the energy dissipated during molecular binding to bits. Using this conversion factor Dr. Schneider discovered that binding sites are 70% efficient. It turns out that rhodopsin in the eye and muscle are also 70% efficient. Dr. Schneider has discovered the basic mathematics that gives this general result.

For more information, see https://alum.mit.edu/www/toms/

For current publications, see the Google Scholar list: https://scholar.google.com/citations?hl=en&user=1p-4Z14AAAAJ&sortby=pubdate

Areas of Expertise

Information Theory Applied To All Fields Of Biology And Bioinformatics

Publications

Selected Key Publications

Information content of binding sites on nucleotide sequences

T. D. Schneider, G. D. Stormo, L. Gold and A. Ehrenfeucht
J. Mol. Biol. 188: 415-431, 1986. [ Journal Article ]

Sequence Logos: A New Way to Display Consensus Sequences

T. D. Schneider and R. M. Stephens
Nucleic Acids Res. 18: 6097-6100, 1990. [ Journal Article ]

Sequence Walkers: a graphical method to display how binding proteins interact with {DNA} or {RNA} sequences

T. D. Schneider
Nucleic Acids Res. 25: 4408-4415, 1997. [ Journal Article ]

A brief review of molecular information theory

Schneider TD.
Nano Commun Netw. 1: 173-180, 2010. [ Journal Article ]

Covers

cover of Nucleic Acids Research April 2006

Comparative analysis of tandem T7-like promoter containing regions in enterobacterial genomes reveals a novel group of genetic islands

Published Date

Twelve prophage-like T7 islands have been discovered in pathogenic bacterial genomes. These islands contain two or three tandem T7-like promoters that should be activated when a bacterial cell is infected by bacteriophage T7 or a related phage. The illustration shows genetic maps for four of the islands, Ty2, BS512, E22 and ECA, which are found in the genomes of S. enterica Ty2, S. boydii BS512, E. coli E22 and E. carotovora SCRI1043 respectively. The T7-like promoters are represented by different colored bent arrows (red, T7; green, K1F; cyan, T3; magenta, unknown T7-like) and by corresponding sequence walkers. As in previously known mobile genetic elements, two of the islands, Ty2 and BS512, are adjacent to a tRNA-Gly gene (pink arrows) and have direct repeats of the 3' end of the tRNA gene (pink arrow tips). The other two islands, E22 and ECA, have different direct repeats on their ends (cyan chevron arrows). Each island encodes an integrase (blue arrows), several putative phage-related proteins (other arrows) and often several insertion sequence elements (white arrows).

Citation

Chen.Schneider-island2006 Z. Chen and T. D. Schneider. Nucleic Acids Res. 34:1133-1147, 2006.

cover of Nucleic Acids Research December 2001

Strong minor groove base conservation in sequence logos implies DNA distortion or base flipping during replication and transcription initiation

Published Date

Dubbed "Tom's T" by Dhruba Chattoraj, the unusually conserved thymine at position +7 in bacteriophage P1 plasmid RepA DNA binding sites rises above repressor and acceptor sequence logos. The T appears to represent base flipping prior to helix opening in this DNA replication initation protein.

Citation

Schneider.baseflip.2001 T. D. Schneider. Nucleic Acids Res. 29:4881-4891, 2001.

cover of Nucleic Acids Res. Nov 1997

Sequence walkers: a graphical method to display how binding proteins interact with DNA or RNA sequences

Published Date

A graphical method is presented for displaying how binding proteins and other macromolecules interact with individual bases of nucleotide sequences. Characters representing the sequence are either oriented normally and placed above a line indicating favorable contact, or upside-down and placed below the line indicating unfavorable contact. The positive or negative height of each letter shows the contribution of that base to the average sequence conservation of the binding site, as represented by a sequence logo. These sequence 'walkers' can be stepped along raw sequence data to visually search for binding sites. Many walkers, for the same or different proteins, can be simultaneously placed next to a sequence to create a quantitative map of a complex genetic region. One can alter the sequence to quantitatively engineer binding sites. Database anomalies can be visualized by placing a walker at the recorded positions of a binding molecule and by comparing this to locations found by scanning the nearby sequences. The sequence can also be altered to predict whether a change is a polymorphism or a mutation for the recognizer being modeled.

Citation

T. D. Schneider. Nucleic Acids Res. 1997 Nov 1;25(21):4408-15.

cover of Journal of Molecular Biology, September 1993

Information analysis of sequences that bind the replication initiator RepA

Published Date

The tall letters represent the highly conserved bases in DNA binding sites of several prokaryotic repressors and activators. Conservation is strongest where major grooves of the double helical DNA (represented by crests of a cosine wave) face the protein. This shows that conservation analysis alone can be used to predict the face of DNA that contacts the proteins.

Citation

Papp.Schneider1993 P. P. Papp D. K. Chattoraj, and T. D. Schneider. J. Mol. Biol. 233:219-230, 1993.