We used a dnmt3 sequence already present in the zebrafish EST database (GenBank number AF135438) to identify and isolate the complete cDNA sequences of four of the dnmt3 genes found in the zebrafish. Three of these are located in the same linkage group (linkage group 23) and two of them very closely juxtaposed to each other (Figure 1). The very close proximity of those two genes has some interesting implications with respect to their origin and the control of their expression, given the much more limited potential promoter region of one relative to the other. We, therefore, undertook a closer examination of the two genes, which we named dnmt3-1 and dnmt3-2.
From the end of the polyA addition site of gene 1 to the beginning of our cloned sequence for gene 2 (probably not actually beginning at the cellular transcriptional start site) consists of only 1428 base pairs. Since there is only a small amount of 5' sequence that is associated with the dnmt3-2 gene this limits the control of the expression of this gene to a small and easily manipulated region. Analysis of this region suggests that it is a TATA-less promoter with a number of potential transcription factor binding sites including AP1 and SP1 binding sites which have also been reported for mammalian Dnmt3s [13, 14]. The sequence of the cloned genes, dnmt3-1 and dnmt3-2 revealed open reading frames that could encode polypeptides of 1447 and 1297 amino acids, respectively. Comparison of the sequences of these two genes to zebrafish genomic maps present in the Genbank database allows for an analysis of the genomic structure. That structure along with the relative position of the two genes is shown in figure 1. The two genes are very similar in sequence; 72% at the nucleotide level and 74% identical at the amino acid level, with large regions being more than 80% identical (figure 2). This is in contrast to only 19–28% similarity at the nucleotide level, and 36–46% amino acid similarity when compared to the other dnmt3 sequences present in the zebrafish genome. This trend is also true for the conserved methyltransferase motifs. For instance, the PWWP motif of gene 1 and gene 2 are 88% and 84% similar at the amino acid and nucleotide levels, respectively, but considerably less similar to the other dnmt3 sequences (e.g. dnmt3-2 vs gene 4, accession #196918, has 64% and no significant similarity at the amino acid and nucleotide levels, respectively) (BLAST, NCBI)).
Recent additions to the sequence databases included two zebrafish sequences that appear to correspond to the same two genes and were named dnmt3 and dnmt5 respectively (GenBank numbers AB196914, AB196916). Our sequencing data corroborate the sequences submitted to the databases except for a few minor variations in regions with triplet repeats which may be an artefact of polymerase slippage in cloning or represent real triplet repeat differences that exist in the gene.
The high homology between dnmt3-1 and dnmt3-2 relative to other zebrafish dnmt3's, as well as their close proximity, suggests that these genes represent a duplication event. Postlethwait et al. [15] provides support for a model where two polyploidization events occurred in a common ancestor of zebrafish and mammals. However, there are often additional multigene members in zebrafish. Postlethwait et al. [15] argues that either chromosome duplication or another tetraploidization event in the zebrafish lineage is the most likely mechanisms by which these additional members arose. The tight clustering seen here, however, suggests that, at least in this instance, tandem gene duplication has occurred.
The most interesting aspect of our analyses is that at least one of the genes, dnmt3-2, includes at least two start sites and a number of splice variants. These were initially identified in cDNA libraries generated from 1–2 cell embryos and RACE-PCR and were later confirmed by RT-PCR in a number of early embryonic zebrafish stages as well as somatic tissues (figure 3). This demonstrates that they are all expressed at least to the level of RNA. Densitometric analysis revealed that the transcript levels are not equivalent and that the relative levels of the different genes and isoforms fluctuate independently between the stages examined (Figure 4). All genes and variants examined are expressed in early embryonic stages, though dnmt-3-2-1 appears to be the most significant prior to zygotic gene activation (zygotic gene activation occurs at ~3 hours). All transcripts demonstrated declining levels leading up to this event, suggesting maternal supply turnover. Following zygotic gene activation however, there appears to be a marked shift towards dnmt3-1 being the most highly expressed. Additionally, there appears to be tissue dependent differences in expression levels (Figure 5). These differences in expression profiles for the different gene products and isoforms suggest that they are regulated independently and each may be playing distinct and separate roles during the development of the zebrafish.
The shortest of these variants, dnmt3-2-1, corresponds to the dnmt5 sequence in the database. The two novel variants reported here differ in size from that sequence by 187 (dnmt3-2-2) and 265 (dnmt3-2-2b) base pairs. These variants are actually associated with the gene having the most restricted promoter region. A schematic of the three products is shown in figure 5.
There are several interesting aspects of these dnmt3-2 variants. To begin with, although the splicing difference between variant dnmt3-2-2 and dnmt3-2-2b appears to involve the same 3' splice junction it has a different 5' splice junction, meaning that one of those splice sites is located within the exon of the other variant. However, both of the junctions still abide by the GT/AG rule for splice junctions.
The second interesting aspect of these splice variants is that all of them are 5' to the initiator AUG. Therefore, none of them actually affect the amino acid sequence. This suggests that either the splicing differences are trivial or they play a regulatory role in the translation or localization or some other aspect of the various splice variants. The latter possibility is a more reasonable assumption since, parsimoniously, it seems unreasonable to assume that this RNA would be alternatively spliced in a variety of ways for no biologically relevant reason. This situation is not unique to zebrafish dnmt3 genes. Similar splice variants in the 5'untranslated region have also been reported for human DNMT3s [13].