A (MOSTLY) NONTECHNICAL GLOSSARY
OF GENETICS, EPIGENETICS,
AND MOLECULAR BIOLOGY
With a Primary Focus on the Human Being


Prepared by Stephen L. Talbott


This is a preliminary and incomplete version of the glossary for a series of articles entitled "On Making the Genome Whole". Readers are invited to submit suggestions for improvement.

acetylation. Attachment of an acetyl group to a molecule, which is then said to be "acetylated". Certain enzymes (known as acetyltransferases) can do this. See also histone modifications. Deacetylation is the corresponding removal of an acetyl group, accomplished by histone deacetylases.

acetyl group. A small chemical group with the formula, -COCH3. See acetylation.

acetyltransferase. An enzyme that can attach a acetyl group to another molecule. In an epigenetic context, the term generally refers to the attachment of a acetyl group to a histone (by a histone acetyltransferase).

actin. A very common protein that forms filaments. It provides a kind of cellular "skeleton" in the cytoplasm of all cells, and also plays a major role in muscle contraction. It has been found to be present in the cell nucleus as well, and to be required for certain chromosomal movements.

activator. A protein transcription factor with a positive effect on gene expression. (Compare repressor.) An activator may work in conjunction with one or more co-activators. With or without the help of co-activators, the activator commonly plays a role in bringing RNA polymerase (the transcribing enzyme) to the gene promoter. The DNA sequence bound by the activator is called an enhancer, and it may lie in the promoter region immediately adjacent to the gene being activated, but it can also be far distant on the chromosome - or even on an altogether different chromosome. It must, however, be brought near to the promoter in order to recruit RNA polymerase. Note: the term "activate" can be used more generally to refer to any process that helps to bring a gene to expression.

adenine. See nucleotide base.

allele. Human genes occur in pairs, one on each chromosome of a chromosome pair. The paired genes are called "alleles" and can have a differing form and significance, as when (in the case of some flowers) the allele on one chromosome tends to produce, say, a red petal color and the other allele tends to produce a white petal color. (The actual color will depend on the relative effects of the two alleles, among other things.)

alternative splicing. Precursor mRNA can be spliced in different ways, so that a single gene may lead, via differently spliced RNAs, to different proteins. It is thought that a high-percentage of human genes are subject to the alternative splicing of their RNA transcripts, and the transcripts of some genes can be spliced in hundreds of different ways. See also RNA splicing and trans-splicing

amino acid. Amino acids are, among other things, constituent elements of protein. There are twenty different kinds of amino acids in human protein, and any number of amino acid molecules -- up to many thousands -- are arranged in sequence to form the main body of a protein.

antigen. A substance that stimulates the generation of antibodies as part of an immune response.

assortative mating. Nonrandom choice of mates in sexual reproduction. In particular, an organism might tend to favor a mate either like itself, or different from itself. In the former case, variation within the larger population will be increased, whereas in the latter case it will be decreased.

ATP. Adenosine triphosphate, a molecule playing a central role in the storage and transfer of energy within the cell. It is used, for example, by ATP-dependent chromatin remodeling complexes, which apply energy derived from ATP to the restructuring of nucleosomes.

basal transcription factor. See general transcription factor.

base pair. Two bonded nucleotide bases joined to opposite strands of the DNA double helix. These paired bases form the "rungs" on the spiraling DNA "ladder", and the bases in each pair are nearly always complementary to each other. See also nucleotide base.

base pair complementarity. The four DNA nucleotide bases are cytosine (indicated by the letter "C"), guanine (G), adenine (A), and thymine (T). It happens that these four bases normally form base pairs in only two ways: cytosine paired with guanine and adenine paired with thymine (C-G and A-T). The members of each pair are said to be "complementary" to each other. This means that a complete, double-stranded DNA molecule can always be formed from a single strand, because added nucleotide bases will pair up with the bases of the single strand in the "correct" way. Actually this is a great simplification, since there are other constituents of DNA beside the nucleotide bases. But because the nucleotide bases are thought of as containing the essential genetic code, one can picture the complementarity of bases as a means of preserving the fidelity of the code. RNA, while normally single-stranded, also preserves this code: in the formation of an RNA molecule from the template of a DNA strand, the nucleotide bases of the forming RNA are added sequentially in the same complementary fashion as when a DNA strand is replicated, except that the base uracil in RNA takes the place of thymine in DNA.

bind. To attach chemically; form a chemical bond with. See also binding site.

binding site. Typically refers to the particular sequence of nucleotide bases on a DNA or RNA molecule that a protein or RNA molecule can "target" and bind to -- that is, can attach to. The affinity of a protein for such a binding site is given by the folded shape, distribution of electrical charges, and perhaps other characteristics of the protein molecule. The binding affinity of an RNA molecule for another RNA or DNA is a matter of sequence base pair complementarity.

bit. In information theory, a single bit represents the amount of information we gain when we learn the actual state of a device that can be in either of two equally likely states. For example, when we flip a coin and discover whether the result is "heads" or "tails", we have gained one bit of information. Putting it differently: our uncertainty about two equally possible outcomes is resolved. Since (simplifying a little) DNA can have any one of four possible nucleotide bases at a given position, we gain two bits of information by learning the actual base (or "letter", as many put it) at that position. It is also often said that the letter contains two bits of information. In any case, "two bits" is only roughly correct because, in the case of nucleotide bases, the choices are not equally likely to any high precision. Worse, the "same" letter is often not really the same, as when it can be either methylated or non-methylated.

branching. The separation of the two strands of DNA. It occurs, for example, during the processes of transcription and replication, but can also occur locally as a result of various chromosome dynamics.

cell nucleus. A membrane-bound organelle in the cells of all higher organisms. It contains the cell's DNA.

chaperone. A molecule that assists in the folding or assembly of other molecules or complexes without itself becoming a part of the end product.

character. In the context of evolution, a "character" refers to a more or less discrete trait of an organism at any level of observation, from the molecular to the visible. Not only physical processes and aspects of form, but also behavioral traits qualify as a characters.

chimeric gene. A gene formed from the fusion of two or more genes. Sometimes these are called "fusion genes", with the term "chimeric gene" referring to the joining of parts of different genes.

chromatin. The complex of DNA, proteins, and RNA that constitutes chromosomes. The Histones that form nucleosome "spools" are the most abundant proteins in chromatin, but many other proteins -- transcription factors, activators, repressors, chromatin remodeling complexes, and other sorts architectural proteins -- play a role. Many of these proteins transiently associate with, and dissociate from, chromatin, which is highly dynamic in form and structure.

chromatin remodeling. The architectural re-structuring of chromatin. This re-structuring can take a number of forms: compaction or opening-up of the chromatin fiber; sliding nucleosomes along the DNA; making histone modifications; contributing to the assembly or disassembly of nucleosomes; and loosening or tightening the binding of DNA to nucleosome spools. All of these changes play a substantial role in gene regulation.

chromatin remodeling complex. Various classes of protein that modify and restructure chromatin. For examples of their action, see under chromatin remodeling.

chromosomal_crossover. A process during meiosis by which the two chromosomes of a chromosome pair exchange parts of themselves.

chromosome. A long, continuous length of DNA "packaged" by means of histones and other proteins and containing many chemical sequences known as "genes". Humans have 46 chromosomes, which come in 23 pairs - one member of each pair being inherited maternally and one paternally. The "same" genes generally occur in both members of a pair; any two such corresponding genes are known as alleles of a single gene.

chromosome territory. A particular region of the cell nucleus characteristically occupied by a chromosome in a given tissue type at a given stage of development and under a particular set of conditions. The spatial organization of chromosome territories within a nucleus has a bearing on gene regulation.

circadian. Of or relating to an approximate 24-hour rhythm.

co-activator. A protein or protein complex that, like an activator, encourages expression of a gene. But whereas the activator, like all transcription factors, recognizes and binds to a specific DNA sequence, a co-activator is not sequence-specific. Rather than binding directly to DNA, it binds to the activator. It may thereby aid, for example, in recruiting RNA polymerase to a gene promoter. See also co-repressor.

codon. "Words" of the genetic code consisting of three successive nucleotide bases, or "letters". Each codon of a protein-coding gene is supposed to correspond to one amino acid in the protein coded for by the gene. See also synonymous codon.

co-factor. A general term referring either to a co-activator or co-repressor.

complementarity. See base pair complementarity

convergent evolution. The acquisition by two unrelated, and perhaps very distantly related, evolutionary lineages of the same or similar trait. A common example of such a trait is the "camera-like" eye in cephalopods (octopus, squid), which is independently acquired in mammals.

co-repressor. A protein or protein complex that, like a repressor, discourages expression of a gene. But whereas the repressor, like all transcription factors, recognizes and binds to a specific DNA sequence, a co-repressor is not sequence-specific. Rather than binding directly to DNA, it binds to (and is often said to be recruited by) the repressor. It may thereby aid, for example, in blocking access of RNA polymerase to a gene promoter. See also co-activator.

cytoplasm. All the contents of a cell outside the nucleus.

cytosine. See nucleotide base.

demethylation. Removal of the marks of DNA methylation.

development. In an epigenetic context, "development" refers most narrowly to the process by which originally undifferentiated cells (for example, stem cells) progressively become specialized or differentiated, or else produce more specialized offspring through cell division. In a broader sense, the term refers to all the processes of growth and maturation in an organism.

differentiation. The movement from less specialized cellular forms to more specialized ones. We can also speak of "organ differentiation", referring to the way that organs, with their specialized cell types, develop from an earlier organism lacking those specializations. These developments typically occur without any changes in the genome - that is, in the genetic sequence of the cells' DNA. Understanding of differentiation therefore requires a reckoning with epigenetic processes.

digital. A sequence such as that of a DNA or RNA molecule is said to be digital if it onsists of discrete, repeatable elements that can be discriminated from each other unambiguously. Due to this digital nature, the information in every stretch of the genome is supposed to be specifiable with perfect, yes-or-no definiteness.

diploid. Possessing two sets of chromosomes -- that is, possessing a pair of each type of chromosome, with one member of the pair inherited maternally and one inherited paternally. In mammals, all cells except the gametes are normally diploid. Compare haploid.

DNA. Deoxyribonucleic acid, a molecule that figures centrally in inheritance. Constituting part of the material of chromosomes, it is commonly double-stranded in the form of a double helix. Connecting the two strands are base pairs consisting of nucleotide bases. Here you will find a conventional animated stick figure of DNA; it schematically represents a few isolated features abstracted from whatever reality the actual material chromosome presents in the cell.

DNA breathing (1). The rhythmic unwrapping and rewrapping of DNA from nucleosomal spools. This takes place at the entry and exit sites -- that is, where the DNA meets or leaves the spool. The breathing takes place rapidly, on the order of milliseconds. Not to be confused with DNA breathing (2).

DNA breathing (2). The dynamic opening and closing of "bubbles" between the two strands of the DNA double helix. That is, for a certain length the two strands of the double helix become disconnected, and then later they reconnect. This is thought to be important for, among other things, the initiation of transcription, because RNA polymerase can only begin transcribing once the double helix has begun to be "unzipped".

DNA methylation. The attachment of a methyl chemical group to particular nucleotide bases (usually cytosine) of the DNA molecule. Methylation is recognized by various regulatory factors and therefore plays a major role in gene regulation. In general, DNA methylation tends to have a repressive effect on gene expression, but this generality is qualified by many subtleties.

DNA replication. The process by which both strands of a double-stranded DNA molecule serve as templates for strand reproduction. The result is two double-stranded DNA molecules, each containing one strand from the original molecule and one newly synthesized strand.

double helix. The form taken by DNA (and also by double-stranded RNA. Speaking very generally, it's the form you get when you take two cords and twist them together, so that each one spirals around the other.

double-stranded RNA (dsRNA). RNA that, like normal DNA, has two complementary strands joined by nucleotide base pairs. dsRNA can be brought into cells by viruses, and it can also be produced natively. This can happen, for example, when a length of RNA happens to contain two adjacent, complementary, and probably rather short sequences of nucleotide bases. That is, when the RNA folds sharply (into a hairpin shape) at the point between the two sequences, it brings a series of complementary bases together, allowing them to form base pairs that hold the two strands together.

downstream. See upstream/downstream.

enhancer. A DNA sequence that transcription factors known as activators can recognize and bind to and thereby aid in recruiting RNA polymerase to a gene's promoter to increase transcription.

epigenetic inheritance. Depending on context this can refer either to inheritance between generations of an organism, or between cell generations within an organism. In both cases the reference is to inherited traits that are mediated, not by the DNA sequence, but by epigenetic processes or conditions. Thus, something in the parents' activity or environment may lead to epigenetic changes in their cells -- and particularly in their germ cells -- that are passed on to their offspring, producing traits in the offspring that cannot be accounted for by the parents' DNA or any mutation of it.

epigenetics. Literally, that which is "added to" genetics. The term is most commonly taken to refer to heritable changes in gene expression that do not result from changes in actual gene sequences. ("Heritable" here can refer not only to inheritance between parents and offspring, but also between parent and daughter cells.) The changes result from the way the larger cellular context interacts with the genes. Nearly all the transformations involved in cellular differentiation fall under the heading of epigenetics.

epigenome. All the structures and processes of the cell bearing on gene expression. The term gains its main force from the (largely false) analogy with "genome". The latter was classically (and now rather disreputably) thought of as the sum total of the genetic code -- a digitally precise "database" containing all the "information" needed to fashion a human being. At least the genome does contain a more or less exact DNA sequence reliably passed (with various recombinations) from one generation to the next. The epigenome, by contrast, manifests nothing like the same sort of fixity.

euchromatin. Chromatin in its less condensed, more open and accessible, and (often) more actively transcribed state, typically richer in genes. Compare heterochromatin.

exon. A segment of the DNA sequence of a gene; more specifically: a segment whose corresponding segment in the gene's RNA transcript is retained until translation rather than deleted as part of the RNA splicing process. Or, in the case of noncoding DNA and its RNA transcripts: exons are those segments retained in the final functional form of the RNA. "Exon" can refer either to the DNA sequence or the corresponding transcribed RNA sequence. Compare intron.

expression. The production of RNA using a DNA sequence as a template. The DNA sequence is then said to have been "expressed" or "transcribed". The DNA sequence may represent a protein-coding gene (see also gene expression), or else it may be noncoding, in which case the expressed RNA is not translated into protein, but may have any of countless regulatory functions within the cell.

gamete. A haploid reproductive cell -- an egg cell in the female or sperm in the male.

gene. Sorry, but you won't pin me down on this one. "A gene is anything a competent biologist has chosen to call a gene" (philosopher of science Phillip Kitcher, 1992). "Our knowledge of the structure and function of the genetic material has outgrown the terminology traditionally used to describe it. It is arguable that the old term gene, essential at an earlier stage of the analysis, is no longer useful, except as a handy and versatile expression, the meaning of which is determined by the context" (geneticist Peter Portin, 1993). For a brief overview of the history of the concept of the gene, see this article by biologist Craig Holdrege.

gene activation. Generally, this can refer to any process leading to increased expression of a gene. More specifically, it refers to the role of an activator in increasing expression.

gene expression. Most commonly this term is applied to protein-coding genes, where it refers to the production of messenger RNA (mRNA) using a DNA gene sequence as a template. The gene is then said to have been "expressed" or "transcribed". The mRNA will (after various sorts of processing, such as splicing) be translated into a protein. Expression also has a more general meaning.

general transcription factor. A protein that is like a transcription factor but without being specific to particular genes; rather it enables the actual process of transcription as such, regardless of the (protein-coding) genes being transcribed. Many of these factors are part of the pre-initiation complex. However, more recent research is showing that the word "general" is a misnomer; these factors can be more or less specific, playing different roles with different genes, or different classes of genes. General transcription factors are also known as "basal transcription factors".

gene regulation. The management (by the cell as a whole) of gene expression. This involves gene activation, gene silencing, the timing and extent of gene transcription, the "editing" of the resultant transcripts, the regulation of translation, and so on - everything that affects what the cell ultimately makes of the gene. Particular regions of DNA that participate in regulation - for example, by being targets for transcription factors - are known as "regulatory regions" or "regulatory sites". Gene regulation is sometimes more narrowly referred to as "transcription regulation".

gene repression. Generally, this can refer to the reduction of expression of a gene, from whatever cause. More specifically, it refers to the role of a repressor in blocking gene expression.

gene silencing. Blocking of the processes that lead from a gene to its possible protein end products. This blocking can occur at many different points. Most generally: it can involve the prevention of gene transcription ("transcriptional silencing"), or the modification or destruction of the gene's mRNA transcript so as to prevent translation ("post-transcriptional silencing"). More particularly, gene silencing can refer to the action of a DNA sequence known as a silencer

genetic code. This term has many meanings both legitimate and illegitimate. Most basically, it refers to the sequence of nucleotide bases, or "letters", in DNA, and to the way that successive groups of three such bases in a protein-coding gene can (with various complications) correspond to the successive amino acids making up a protein. The gene is then said to "code for" that protein. Each three-letter group of a coding sequence is a codon.

genetic drift. "Random" genetic change that becomes established in a population despite having no particular adaptive value.

genetic recombination. The exchange of material between chromosomes, of which chromosomal crossover is one example.

genome. All the DNA in an organism or cell, especially with reference to the total sequence of nucleotide bases, or "letters" of the genetic code.

genotype. Refers to the genetic constitution of a cell or organism, considered as being of a certain type or as corresponding to a certain character. Either the genome as a whole can be in view, or a particular gene (allele) or combinations of genes. Often contrasted with phenotype.

germ cells. The cells, belonging to the germline, that give rise (through mitosis and meiosis) to the gametes that combine in sexual reproduction. Often contrasted with somatic cells.

germline. The "line" or succession of cells that leads from one generation to the next through the germ cells.

guanine. See nucleotide base.

haploid. Possessing only a single set of unpaired chromosomes. Gametes are normally haploid. Compare diploid.

helical axis. If you imagine the two strands of a double helix wrapped around a wire core, this wire would represent the helical axis.

heterochromatin. Chromatin in its more tightly packed, less accessible, and less actively transcribed state, often containing fewer genes. nucleosomes and various chromatin-associated proteins play a major role in the compact structuring of heterochromatin. Compare euchromatin.

heterozygous. An organism is said to be heterozygous with respect to a particular gene if the two alleles of the gene are different, as when a pea plant has one allele for a white flower color and one allele for violet-colored flowers. (The actual trait in such cases depends upon the dominance relations between the two alleles.) Compare homozygous.

histone. A family of simple proteins, abundant in the cell nucleus and constituting a substantial part of the (mostly) protein-and-DNA complex known as chromatin - the physical substance of chromosomes. A group of eight histones - two each of four different kinds - makes up the "spool" of a nucleosome. Linker histones also participate in chromatin.

histone code. The code presumed to be found in the collection of histone modifications. The idea is this: for any given nucleosome there are many possible (co-valent) modifications of its constituent histones, leading to countless possible combinations of such modifications. It could be, then, that for each distinct combination -- or, at least, for many of them -- there is a specific gene-regulatory implication. For example, a particular combination might result in the binding of a specific chromatin remodeling complex. This mapping from specific combinations of histone modifications to specific effects would be the "code". However, the idea that these modifications not only have regulatory significance but have it in a fixed, precise, combinatorially encoded fashion now looks as if it is being increasingly discredited.

histone modification, often referred to as "histone post-translational modification", because the changes occur after the translation that produces the histone protein. A histone modification consists of the addition or subtraction of any one of several chemical groups to an individual amino acid of a histone - especially a histone belonging to a nucleosome. The modified amino acid might be on either the histone tail or the main body of the histone. Depending on the chemical group involved, the modification is called methylation (addition of a methyl group), acetylation (addition of an acetyl group), phosphorylation, ubiquitination, sumoylation, and so on. These modifications can dramatically affect the electrical and other properties of nucleosomes, and they play a major role in gene regulation.

histone tail. A thin, filamentary "tail" typically extending from each of the eight histone proteins constituting the core particle, or spool, of a nucleosome.

homozygous. An organism is said to homozygous with respect to a particular gene if the two alleles of the gene are essentially the same, as when a pea plant has two alleles specifying a white color for flowers. Compare heterozygous.

hormone. A substance produced in particular cells (for example, in a gland) that can travel to other parts of the body and (often in very small quantities) influence those other parts. The hormone, which may be recognized by receptor molecules, is often said to carry a signal.

Inheritance of acquired characteristics (Lamarckism). The idea that traits can be passed from an organism to its offspring, not only as those traits are determined in a fixed way by genes, but also as they are altered by the activity of the parent organism during its life. The classic example for ridiculing this notion is the giraffe's neck: no matter how much a giraffe stretches its neck during its lifetime in order to browse on higher leaves, this will not affect the inherited neck length of the giraffe's offspring. But researchers today are exploring a rapidly increasing number of cases where acquired characteristics are passed on to offspring quite independently of genetic inheritance. This inheritance is achieved by epigenetic means. "Lamarckism" refers to Jean-Baptiste Pierre Antoine de Monet, Chevalier de la Marck (1744-1829), who argued for the inheritance of acquired characteristics.

initiator (Inr). The initiator, one of the components of a gene promoter. In the absence of the TATA box -- or in conjunction with it or with other promoter elements -- Inr can provide a base for the constellation of the pre-initiation complex.

insulator. A DNA sequence that acts as a kind of boundary element, blocking the effects of certain regulatory elements. In particular, an insulator can block the role of an enhancer, or, more broadly, it can prevent the spread of highly condensed chromatin into neighboring regions, where the condensed chromatin might have the effect of suppressing gene expression. Insulators help make possible the independent regulation of nearby chromosome regions.

intron. A segment of the DNA sequence of a gene; more specifically: a segment whose corresponding segment in the gene's RNA transcript is deleted from the transcript before translation. The deletion occurs as a result of the RNA splicing process. Or, in the case of noncoding DNA and its RNA transcripts: introns are those segments deleted before the final functional form of the RNA is achieved. "Intron" can refer either to the DNA sequence or to the corresponding transcribed RNA sequence. Compare exon.

in vitro. "In glass" -- that is, in an artificial environment such as a test tube or laboratory dish.

in vivo. In a living context -- more specifically, in the living cell or organism.

ionizing radiation. Radiation consisting of high-energy particles that can strip electrons from atoms, thereby changing their chemical reactivity, which in turn can cause biological damage.

jumping gene. Another name for a transposon.

LCR. See locus control region.

linker DNA. The relatively short length of DNA extending between successive nucleosomes. It is typically a few tens of base pairs long.

linker histone. A histone (most often the histone known as "H1") that binds the DNA entering a nucleosome spool to the exiting DNA, thereby stabilizing the nucleosome and conducing to the formation of more regular arrays of compact chromatin.

locus control region (LCR). A DNA sequence that helps to regulate a cluster of related genes. These genes may be both nearby and far away on the same chromosome, or even on different chromosomes. The LCR plays a role in organizing the chromatin sections containing the genes and coordinating their expression.

major groove. If you wrap two cords around each other in the manner of a double helix, there will be two grooves between the cords. Each groove winds along with the cords. However, in this case, you would not see a difference in the "width" of the two grooves -- or any width at all. But if the cords are separated by bulky material attached to both of them, then -- depending on the shape of that material -- the distance in going from one cord to the other (passing around the bulky material in the middle) may be greater in one direction than in the other. This is the case when the "filler" material consists of assymetric nucleotide bases (the "letters" of the genetic code. The wider of the two grooves is the major groove, and the narrower one is the minor groove.

MAR (matrix attachment region). A DNA sequence particularly well suited to serve as an anchoring site for tethering DNA to the nuclear matrix. The constellation of many such tetherings contributes to the looping structure of chromosomes.

mark. Geneticists commonly refer to the attached chemical groups resulting from DNA methylation or histone modification as "marks". Using the word verbally, one can say, for example, that an enzyme "marks" DNA with methyl groups, while another enzyme removes such marks -- that is, removes the methyl groups.

meiosis. A complex, multistage process of cell division in the reproductive organs by which a single cell duplicates its chromosomes and divides twice, yielding four cells, each with half the number of chromosomes as the original parent cell. In animals, these resulting cells, which are haploid, are called gametes. Compare mitosis.

meiotic. Relating to meiosis.

membranome. A rather vague term referring to the collection of biological membranes in a cell or organism -- particularly with reference to their informational role. In all likelihood, this is simply to say: with reference to their biological significance and functioning. "Membranome" may (not by accident) include in its connotations something like "digital code" (ala the genome), but that is presumably only for a certain feel-good effect.

Mendel's Laws. The first law -- the Law of Segregation -- states that gametes) receive only one member of each parental chromosome pair. So a single gamete contains only one allele of any particular gene. The second law -- the Law of Independent Assortment (also known as the Law of Inheritance) -- states that distribution of alleles to gametes occurs independently for each gene. That is, if gene X has alleles x' and x", and if Gene Y has alleles y' and y", then the following combinations in gametes are equally possible: x'y', x'y", x"y', and x"y". This second law, as it happens, is not true in general; it is valid only where genes are not linked, as they often are when they reside on the same chromosome -- in which latter case the alleles of the two genes on that chromosome will commonly be passed along to the gametes together rather than independently.

messenger RNA (mRNA). A kind of RNA. Different mRNAs result from transcription of protein-coding genes and can lead, via translation, to the formation of proteins. See also RNA

methylation. Attachment of a methyl group to a molecule, which is then said to be "methylated". Certain enzymes (known as "methylases" or methyltransferases) can do this. See also histone modifications and DNA methylation. "Demethylation" is the corresponding removal of a methyl group, accomplished by demethylases.

methyl group. A small chemical group with the formula, -CH3. See methylation.

methylome. The sum total or overall pattern of DNA methylation in a genome.

methyltransferase (or methylase). An enzyme that can attach a methyl group to another molecule. In an epigenetic context, the term generally refers to the attachment of a methyl group to DNA (by a DNA methyltransferase) or to a histone (by a histone methyltransferase).

micro-RNA (miRNA). A small RNA, 21-23 nucleotide bases in length. Like the siRNA involved in RNA interference, miRNA is derived from double-stranded RNA (although not double-stranded RNA that originates from viruses). And it, too, becomes associated with a protein complex known as a "RISC", in cooperation with which it disables messenger RNA molecules containing sequences complementary to its own. One difference between miRNA and the siRNA involved in RNA interference is that the complementarity between the miRNA and the target messenger RNA need not be exact, so that a single miRNA molecule can neutralize many different messenger RNA molecules. This effectively silences, or at least reduces the expression of, the genes producing those messenger RNAs. There are at least several hundred different miRNAs in humans.

minor groove. The narrower of the two grooves running the length of the double helix. For further explanation, see major groove.

mitosis. The complex process of chromosome duplication and separation (usually during cell division), resulting in two diploid daughter cells with the same number of chromosomes as the parent cell. The DNA sequences in parent and daughters are generally (although not always precisely) identical. Compare meiosis.

mitotic. Relating to mitosis.

morphogenesis. The physical structuring, or shaping, of an organism or part of an organism. This occurs most dramatically during the growth of the embryo as cells and tissues differentiate.

motif. A frequently occurring sequence of nucleotide bases in DNA (or RNA) that has functional significance, as when regulatory proteins “look” for it and bind to it. (The term “structural motif” is applied to the protein structures that recognize the nucleotide sequences.)

multipotent. Capable of developing into two or more closely related types of cell. For example, blood stem cells can develop into red cells, white cells, and platelets. Compare totipotent and pluripotent.

mutagenic. Tending to produce mutations.

mutation. A change in the DNA sequence of nucleotide bases -- which is to say (in the usual terminology), a change in the genetic code. An organism containing a mutation is (when the mutation is what one has in view) said to be a "mutant", and a substance tending to cause mutations is a "mutagen".

myosin. A contractile protein, or, rather, a family of proteins. It is the most common protein found in muscles, working together with actin to produce muscle contraction. Myosin consumes energy in driving movements along actin filaments.

naked DNA. DNA that is not wrapped around nucleosomes. Generally, this refers to longer stretches of DNA, not the short linker DNA leading directly from one nucleosome to another.

noncoding. RNA or DNA that does not code for a protein is said to be "noncoding". Noncoding DNA can have many regulatory functions and can even be transcribed into RNA, but the resultant RNA is also noncoding (will not be translated into a protein) and likewise can have many regulatory functions.

nuclear envelope. The membrane (technically, a double lipid bilayer) that encloses the cell nucleus, separating the genetic material and other contents from the rest of the cell. However, there is intimate communication across the envelope, and numerous "nuclear pores" offer passage between the nucleus and the larger cellular environment.

nuclear lamina. A fibrous network, together with associated proteins, located in the periphery of the cell nucleus, at the inner face of the nuclear envelope. At any given time some chromosome sites can be attached to the nuclear lamina, a situation that tends to correlate with reduced gene expression.

nuclear matrix. A poorly characterized and highly dynamic structural skeleton giving organizational structure to the cell nucleus.

nuclear pore. A narrow channel formed through the nuclear envelope by several hundred protein molecules. The channel allows carefully regulated molecular traffic between the cell nucleus and the cytoplasm. The pore is by no means a static structure, and some of its molecules also move through the nucleus performing other functions. The pore and its constituents play roles in gene expression.

nuclear receptor. A special class of receptor whose members reside in the nucleus and can bind directly to DNA, either activating or repressing gene expression. Nuclear receptors are therefore transcription factors.

nucleoplasm. A highly viscous liquid that, in the cell nucleus, corresponds to the cytoplasm of the cellular region outside the nucleus. It is the medium in which the great macromolecules — preeminently the chromosomes — reside.

nucleoprotein. Protein contained in a complex with (most commonly) DNA or RNA. For example, chromatin is a nucleoprotein.

nucleosome. A group of (usually) eight histone proteins that together form a kind of "spool" around which DNA is commonly wrapped somewhat less than two turns. (The length of DNA wrapped around a "standard" nucleosome is usually given as 147 base pairs. But many variations upon this standard length are currently being investigated.) There are millions of nucleosomes in the human genome, and they are key elements in the compaction, or condensation, of DNA. They are a focus of many different aspects of gene regulation.

nucleosome free region. A stretch of DNA that is free of nucleosomes, perhaps because they have been disassembled and removed, or else have shifted their position by sliding along the DNA.

nucleosome_sliding. The process by which DNA slides around a nucleosome spool. The effect is to displace the spool linearly along the DNA. As a result, some DNA sequences that were wrapped around the nucleosome (and therefore less accessible to regulatory factors) are exposed as free or naked DNA, while other sequences, previously free, are bound to the nucleosome.

nucleotide base. A class of nitrogen-containing chemical groups that are constituents of DNA and RNA. The four main bases in DNA are adenine, guanine, cytosine, and thymine (A, G, C, and T, respectively - "letters" of the genetic code). In RNA, uracil (U) stands in the place of thymine. These bases combine in restricted ways to form complementary base pairs. This complementation is central to DNA replication and gene expression because of the way it allows the strands of DNA to be used as templates for replication or for production of RNA that preserves the sequential information employed by the cell in protein production.

nucleus. See cell nucleus.

oncogenic. Tending to cause cancerous tumors.

open chromatin. See the more technical term, euchromatin.

parasitic genetic element. A genomic sequence that is thought to be essentially alien to the organism, "using" the organism to advance its own survival, or, rather, the survival of other elements of its kind.

phenotype. An organism's observable traits, considered as a whole or in part. It can include everything from biochemical characteristics to form and behavior. Often contrasted with genotype.

phosphorylation. Attachment of a phosphate group to a molecule, which is then said to be "phosphorylated". Certain enzymes (known as kinases or phosphotransferases) can do this. See also histone modifications. Dephosphorylation is the corresponding removal of a phosphate group, accomplished by phosphatases.

phosphate group. A small chemical group with the formula, PO4. See phosphorylation.

pluripotent. Capable of developing into a considerable range of different cell types. For example, embryonic stem cells can transform themselves into many, but not all, tissue types during fetal development. Compare totipotent and multipotent.

pre-initiation complex (PIC). A group of multi-subunit protein complexes, including RNA polymerase that come together on a gene promoter as a preparatory step for gene transcription.

polymer. A large molecule (macromolecule) composed of multiple repeated units which are identical or similar.

pre-cursor messenger RNA (pre-mRNA). RNA transcripts that have not yet been spliced.

promoter. A regulatory DNA sequence, usually close to, and upstream from, the gene or genes it regulates. It serves as a binding site for transcription factors and for the protein complexes that initiate gene transcription, and it serves to identify the start site for transcription.

protein coding gene. A gene whose DNA sequence can lead, via transcription and translation, to production of one or many different proteins. The gene is said to code for the protein that eventuates from it.

protein-coding RNA. RNA capable of producing protein via translation. This RNA, called "messenger RNA", is derived from protein-coding genes and may subsequently be altered by processes such as RNA editing and alternative splicing before being translated into protein.

receptor. A protein (residing in cytoplasm or embedded in a cell membrane) to which a signaling molecule, such as a hormone, can attach. The result is typically a change in conformation of the protein, which in turn may lead to changes, sometimes dramatic, in the protein's surrounding milieu. Compare nuclear receptor.

regulatory factor. See gene regulation.

regulatory region. See gene regulation.

repetitive dna. DNA sequences, whether long or short, that occur repeatedly in the genome. They may occur in immediate succession, or with other sequences interspersed between them, and may also occur in inverted ("turned around end for end") form. Published figures vary, but repetitive sequences (including transposons) are often said to constitute somewhere between 40% and 50% of the human genome.

repressor. A protein transcription factor with a negative effect on gene expression. (Compare activator.) A repressor may work in conjunction with one or more co-repressors. With or without co-repressors, the repressor commonly blocks access to the gene promoter by RNA polymerase (the transcribing enzyme). The DNA sequence bound by the repressor is called a silencer.

reverse transcription. Instead of DNA being transcribed into RNA, this refers to the opposite process: RNA transcribed into DNA. This new DNA may then be integrated into the genome.

ribonome. The entire collection of RNA molecules in the cell and organism at any one moment, along with the diverse proteins that associate with them.

ribosomal RNA (rRNA). See under RNA.

RISC (RNA-induced silencing complex). A protein complex that plays a central part in RNA interference. The complex consists of several proteins together with a small interfering RNA (siRNA). The complex locates mRNA molecules containing sequences complementary to the siRNA, after which a protein in the complex cleaves the mRNA or otherwise damages it so as to prevent translation.

RNA. Ribonucleic acid. Like DNA, it contains a series of nucleotide bases (often thought of as "letters" of the genetic code). However, in RNA the uracil (U) base, or "letter", occurs instead of the thymine (T) of DNA. RNA is classically thought of as existing in three primary forms, each generated by one of the RNA polymerases: (1) mRNA (messenger RNA), produced from a protein-coding gene-template, preserves the gene's code and is an intermediary between the gene and the protein it specifies. mRNA is normally single-stranded. (2) rRNA (ribosomal RNA), which forms part of the protein-producing ribosomes of the cell, interprets the mRNA sequence as a set of "codes" specifying the series of amino acids from which the protein is to be constellated and engages in the actual production of the protein. (3) tRNA (transfer RNA) then brings the actual amino acids for adding to the growing protein molecule. More recently, a great variety of RNA types, both small and large, both protein-coding and noncoding, have been discovered. They play a major role in many epigenetic processes.

RNA cleavage. The "cutting up" of RNA molecules into smaller pieces. Those pieces may be protein-coding or regulatory, and the latter include both large regulatory RNAs and small regulatory RNAs such as micro-RNAs and small interfering RNAs.

RNA editing. The process by which particular nucleotide bases ("letters") are removed from a precursor RNA and replaced with different bases.

RNA gene. A gene that is transcribed to produce RNA that is not protein-coding. Such RNAs can perform regulatory roles in the cell.

RNA interference (RNAi). Regulation of gene expression -- and especially the silencing of genes -- by processes involving small RNA molecules about 21 - 25 nucleotide bases long. This RNA is known as "small interfering RNA" or siRNA. A protein complex incorporating an siRNA and called a RISC locates mRNA molecules with a sequence complementary to that of the siRNA and proceeds to cleave the mRNA or otherwise prevent it from being translated. This is known as "post-transcriptional silencing" because it effectively silences genes (preventing the production of protein from them) only after the genes have been transcribed. However, more and more other roles are being discovered for siRNAs -- for example, in DNA methylation, chromatin remodeling and small RNA-induced gene activation.

RNAi (RNA interference) code. A vague term meaning little more "the sum total of what we know about the role of RNA interference" in gene expression.

RNA polymerase. The enzyme (protein) that transcribes DNA (protein-coding genes, but also various noncoding sequences) into RNA. In humans, different RNA polymerases (I, II, and III) transcribe different sorts of DNA sequences.

RNA splicing. The process by which introns are removed from a pre-mRNA transcript and the remaining exons are joined together. Splicing occurs preliminary to translation of the transcript or, in the case of noncoding transcripts, preliminary to the achievement of the functional RNA end-product. The nature of the splicing will determine what sort of protein a protein-coding RNA can produce. The splicing is usually carried out by a large RNA-protein complex known as a "spliceosome". See also alternative splicing and trans-splicing.

sequence. A contiguous group of nucleotide bases ("letters") in a DNA or RNA molecule. Particular sequences may be significant in many different respects. For example: (1) they can define specific locations recognizable by transcription factors and other regulatory molecules; (2) they can influence the structure and stability of the double helix and the associated chromatin; (3) they can influence the positioning of nucleosomes and in general the chromatin structure; (4) they can code for proteins; and (5) they can code for regulatory RNAs. "Sequence" can also refer to the linear chain of amino acids constituting the main structure of a protein. Such protein sequences correlate (more or less) with DNA and RNA sequences by means of the genetic code.

sequencing. The process of ascertaining the sequence of a DNA or RNA molecule or of a particular region of the molecule.

signal. A molecule, commonly a protein, that initiates a signaling process and therefore is often spoken of as having a meaning (or carrying a message) related to the outcome of the process. Outcomes include such things as cell migration, cell multiplication, cell death, change in cell shape, and gene expression or repression.

signaling. A broad term referring to various aspects of complex molecular communication within the cell and organism. Signaling pathways are coherent sequences of molecular interactions by which an initial interaction--say, the binding of a cell membrane receptor by a hormone signal--leads to a more or less defined result, or group of results, "downstream". One result, for example, might be the activation of a set of genes. Signaling pathways often branch, leading to an amplification or diversification of consequences in what is known as a signaling cascade.

signaling cascade. See signaling.

signaling pathway. See signaling.

silencer. A DNA sequence that transcription factors known as repressors can recognize and bind to, thereby (more or less) blocking access to a gene and preventing its transcription.

small interfering RNA (siRNA). A small RNA 21-25 nucleotide bases in length that plays a key role in RNA interference. siRNAs are derived from double-stranded RNA molecules, which are often brought into cells by viruses. The double-stranded RNA is cleaved into small lengths, and a product of this cleavage is assimilated to a RISC protein complex, at which time the two strands of the RNA are separated and one is discarded. See further under RNA interference.

somatic cells. All the cells of the body outside the germline. Somatic tissues are tissues made up of somatic cells. Such cells are normally diploid, in contrast to the haploid gametes

spliceosome. A dynamic molecular complex consisting primarily of a few small RNAs and up to several scores of protein subunits, and performing (in diverse ways) the task of RNA splicing.

stem cell. A more or less undifferentiated (nonspecialized) cell capable of dividing indefinitely as a stem cell, or else of differentiating into more specialized cell types. Embryonic stem cells are primitive stem cells in the embryo capable of differentiating into most of the cell types of the body. Adult stem cells, normally found in adult tissues, can differentiate into at least a few different cell types. And induced pluripotent stem cells result from the reversion of a differentiated cell to a pluripotent form through human engineering.

supercoil. If you twist two strands around a linear wire core, you will have a double helix that coils, or spirals, around an axis represented by the wire. (The wire is invoked here only to identify the axis of the double helix.) If now you twist that whole arrangement so that the axis coils on itself, you have what is called a supercoil. Further, there are two directions in which you can perform this second level of twisting. One is "with" the original twist of the double helix (which yields positive supercoiling), and the other is against this original twist (negative supercoiling). If the ends of the two strands are fastened together so that they are not free to slide around each other, then negative supercoiling will tend to force the strands apart, or "open" them up, while positive supercoiling will have the opposite effect. (It's best to try this with real cords!)

synonymous codon. Two codons are said to be synonymous if they both code for the same amino acid. Because the genetic code is redundant, several codons can map to the same amino acid.

synonymous mutation. The alteration of a codon into a different form that is synonymous with the original.

TATA box. A DNA sequence having these nucleotide bases ("letters") as its core: TATAAA. The TATA box is one of the several elements contained in gene promoters. Whereas it was once thought to be a more less canonical element of promoters, it is now believed to be present in less than 25 percent of human promoters. Recognition of the TATA box by the TATA-binding protein is an initial step in the formation of the pre-initiation complex.

TATA-binding protein. A general transcription factor that binds to the TATA box promoter element, typically to begin constellation of the pre-initiation complex on the promoter.

TFIIB. A general transcription factor contributing to the formation of the pre-initiation complex. It helps to stabilize the .Gs "binding" "" "bind" of .Gs "TBP" "" "tbp" and to prepare the way for binding of

TFIID. A general transcription factor contributing to the formation of the pre-initiation complex.

TFIIE. A general transcription factor contributing to the formation of the pre-initiation complex.

TFIIF. A general transcription factor contributing to the formation of the pre-initiation complex.

TFIIH. A general transcription factor contributing to the formation of the pre-initiation complex.

thymine. See nucleotide base.

topoisomerase. Enzymes (proteins) that cut the strands of a DNA molecule and then reconnect the strands. The effect may be to release the tension of supercoiling or to untangle knots. Some topoisomerases cut just one of the strands of the double helix, allow it to wind or unwind around the other strand, and then reconnect the severed ends. Other topoisomerases cut both strands, pass a loop of the chromosome through the gap thus created, and then seal the gap again.

totipotent. Capable of developing into every cell of the body. A zygote is totipotent. Compare pluripotent and multipotent.

trait. See character.

transcribing enzyme. See RNA polymerase.

transcript. The RNA molecule that is the product of gene transcription. Transcripts begin as "primary" or "precursor" transcripts, which then can be spliced, edited, or otherwise transformed before (in the case of many RNAs) being translated into a protein.

transcription. The process by which an RNA polymerase (in cooperation with many other cellular elements) uses a DNA gene template to form an RNA molecule such as a messenger RNA (mRNA). The gene is said to have been "transcribed", and the RNA is a "transcript".

transcription complex. The aggregate of numerous proteins (typically scores of them) that must bind to a gene's promoter region before actual transcription of the gene can begin.

transcription factor. A protein that binds directly to a recognized DNA sequence, thereby playing a role in gene regulation. Transcription factors called activators may increase a gene's expression, while repressors may decrease expression.

transcription factory. A locale within the nucleus within which regions of one or more chromosomes and various transcription-associated factors are thought to co-localize, facilitating the transcription of a particular set of genes.

transcription start site. The nucleotide base at the upstream end of a gene where actual transcription of the gene begins.

transduction. In relation to signaling: the transformation of a signal, as when the binding of a hormone to a receptor at the cell membrane results in formation of a protein complex in the cytoplasm, which in turn carries out some function in the cell. In this case, the hormone "signal" is said to be transduced into the protein complex. This might be just one step in a multi-step signaling pathway.

translation. The production of a protein from mRNA. This protein is often said to be "coded for" by the gene from which the mRNA was transcribed, but it is well known that diverse activities of the cell can result in any one of many - up to thousands - of different proteins being produced from a particular gene, or DNA sequence.

trans-splicing. The splicing together of entirely different gene transcripts to form a translation-ready mRNA. The genes may reside on the same or on different chromosomes. See also RNA splicing and alternative splicing.

transposon. A DNA sequence that can move within the genome. It may be cut from one place and moved to another — a recontextualization that may have profound functional effects — or it may be copied and inserted elsewhere, in which case it adds to the total content of the genome. By virtue of their copy-and-paste role, transposons can figure greatly in the creation of repetitive DNA

tRNA (transfer RNA). See under RNA.

tumor suppressor gene. A gene from which a protein is derived that helps to protect a cell against cancer. The protein may do this, for example, by preventing or damping cell division (cells that are becoming cancerous tend to divide without proper restraint) or by promoting cell death in the event of DNA damage.

ubiquitin. A small protein molecule consisting of 76 amino acids. See ubiquitination.

ubiquitination. Attachment of ubiquitin to a molecule, which is then said to be "ubiquitinated". Certain enzymes can do this. See also histone modifications.

upstream/downstream. DNA consists of the two strands of a double helix. The orientation of the chemical constituents of these strands gives a directionality to the strands and enables one to distinguish the two ends, which are referred to as the 5' and the 3' ends. The two strands of a double helix are oriented oppositely, so that the 5' end of one strand is adjacent to the 3' end of the other strand. Gene transcription typically proceeds from the 5' end of the gene (that is, from the end of the gene closer to the 5' end of the chromosome) toward the 3' end. The stretches of DNA lying beyond the gene and toward the 5' end of the chromosome are said to be "upstream" from the gene, while the DNA lying toward the 3' end of the chromosome are "downstream" - in the direction of usual transcription. Promoters lie adjacent to their genes on the upstream side, where transcription begins. "Upstream" and "downstream" are also used with non-technical meanings and without particular reference to DNA, as when one refers to chemical reactions "downstream" from some initiating reaction.

uracil. See nucleotide base.

variant histone. A "non-standard" form of any one of the four different types of histone making up a nucleosome spool. For example, the H2A.Z histone can substitute for the canonical H2A, with the effect of destabilizing the spool and making it more susceptible to sliding. There are also variant linker histones.

zygote. A fertilized, diploid egg cell resulting from the union of two haploid gametes.


Steve Talbott :: Glossary of Epigenetics