Conservation analysis of sequences flanking the testis-determining gene Sry in 17 mammalian species
© Larney et al. 2015
Received: 8 April 2015
Accepted: 25 September 2015
Published: 6 October 2015
Sex determination in mammals requires expression of the Y-linked gene Sry in the bipotential genital ridges of the XY embryo. Even minor delay of the onset of Sry expression can result in XY sex reversal, highlighting the need for accurate gene regulation during sex determination. However, the location of critical regulatory elements remains unknown. Here, we analysed Sry flanking sequences across many species, using newly available genome sequences and computational tools, to better understand Sry’s genomic context and to identify conserved regions predictive of functional roles.
Flanking sequences from 17 species were analysed using both global and local sequence alignment methods. Multiple motif searches were employed to characterise common motifs in otherwise unconserved sequence.
We identified position-specific conservation of binding motifs for multiple transcription factor families, including GATA binding factors and Oct/Sox dimers. In contrast with the landscape of extremely low sequence conservation around the Sry coding region, our analysis highlighted a strongly conserved interval of ~106 bp within the Sry promoter (which we term the Sry Proximal Conserved Interval, SPCI). We further report that inverted repeats flanking murine Sry are much larger than previously recognised.
The unusually fast pace of sequence drift on the Y chromosome sharpens the likely functional significance of both the SPCI and the identified binding motifs, providing a basis for future studies of the role(s) of these elements in Sry regulation.
KeywordsSRY Sex determination Y chromosome Gene regulation Testis Gonad
Expression of Sry, a gene located on the Y chromosome, is required for differentiation of mammalian bipotential genital ridges into testes, a role evinced by the development of testes in XX mice with a 14.6 kb transgenic construct containing Sry and no other genes . SRY initiates testis development by binding to a testis-specific enhancer of Sox9 , a gene with a highly conserved role at the centre of the testis development program.
Perhaps surprisingly for a gene with such profound developmental consequences, Sry expression is required only in a small population of cells of the developing genital ridges to initiate male development . In mice, initial expression of Sry at 10.5 days post coitum (dpc) is restricted to the central region of the genital ridge, but expands to fill the entire gonad by 11.5 dpc, before being extinguished to undetectable levels by 12.5 dpc [4, 5]. This short window of expression is so barely adequate for the task that delays of just a few hours lead to either ovarian or ovotestis development . If SRY regulation is similarly critical in humans, it seems likely that improved understanding of factors and pathways regulating Sry will explain some undiagnosed XY disorders of sex development.
Previous studies have implicated a variety of factors in regulating Sry (reviewed in ) but specific cis-regulatory sites for these factors remain obscure. Assays such as ChIP-seq, typically used to identify regulatory elements, founder on a paucity of suitable tissue, as the small number of cells in which Sry is expressed render in vivo tissue collection difficult, and known Sry-expressing cell lines such as NT2-D1 and HepG2 do not recapitulate the expression profile of gonadal cells where Sry is expressed. These difficulties have led to a number of attempts to identify cis-regulatory regions in silico, by locating conserved regions in aligned Sry 5’ flanking sequences [8–12]. While contemporary tools make this an easy task for most genes, a lack of informative Y chromosome sequence has continued to hamper similar studies for Y-linked genes such as Sry. The lack of Y chromosome sequence can be traced to a strong bias towards using female samples in sequencing projects , a situation being addressed by the Y Chromosome Genome Project (https://www.hgsc.bcm.edu/y-chromosome-genome-project). Meanwhile, reports of in silico analysis of Sry flanking regions have been restricted to sequences from relatively few species, and at most a few kilobases in length.
Here, we present an analysis of sequences an order of magnitude longer than has previously been possible, and from a broader range of species. In addition to both global and local sequence alignment methods, we use several different sequence motif analyses and demonstrate that flanking regions of Sry vary considerably, even between closely related species, boosting the likelihood that conserved regions and motifs in the proximal promoter are indeed functional. Building on this result, we also hypothesise that proximal elements are the only regulatory sites required for Sry’s conserved testis-determining role.
Locating Sry flanking sequences
Sequences containing Sry and its flanking regions
23 812 571
23 813 185
42 225 210
42 225 899
2 655 030
2 655 644
Supplementary Data 1
1 918 381
1 919 568
33 265, 122 106
33 975, 122 816
33 kb, 20 kb
70 kb, 122 kb
90 kb, 60 kb
3 kb, 30 kb
Despite locating flanking sequences from 18 species, only 17 were used in our analysis. Sequences from the rat were excluded on the basis of high copy number , and inability to determine which copy(/ies) may be testis-determining. Though the rat has the highest Sry copy number (11) among the species we examined, duplicate copies of Sry are also known to be present in pig and rabbit. In those species, comparison of the copies revealed coding and flanking sequences to be virtually indistinguishable. For this reason, we arbitrarily chose a single copy from each species to include in the analysis.
Global alignment of open reading frames
Global alignment of flanking sequences
Motif scanning and enrichment
Having identified the SPCI as the most conserved part of the Sry flanking sequence, we subsequently wanted to locate potential transcription factor (TF) binding sites. To this end we first scanned the SPCI sequence of human and mouse (Additional file 4) using FIMO  and 1270 motifs from three major databases (see Methods). The strongest result, at the 5’ limit of the sequence, was to a previously unreported instance of an Oct/Sox motif (MA0142.1 from the JASPAR database; p-value = 3x10−5; Fig. 5c). Other previously unreported potential binding sites, including those for Meis and Forkhead factors, were also present (Fig. 5c), along with a reported WT1 motif  (EGR1_DBD from the Jolma database; p-value = 3x10−5), coincident with the human TSS.
We next simultaneously scanned multiple sequences for motifs with CentriMo . CentriMo searches for motif occurrences at similar positions in multiple sequences, so it was first necessary to re-align the ungapped sequences using a position other than the XSS. We chose the base that RPhast had identified as the most conserved (Fig. 5a), and determined the position of this base in each sequence (Additional file 5: Table S2). Using these positions as references, we then took the sequences extending for 100 bases in the 5’ direction and 500 bases in the 3’ direction (for a total of 600 bp; Additional file 6), and scanned them for 1270 motifs, from the same databases mentioned previously, using CentriMo (Additional file 7).
Enriched motifs identified by CentriMo and FIMO
Figure 6 Label
Position relative to anchor SP motif
Pitx, Dmbx, Gsc
Gcm, Sox, Rar
SP, Egr, Smad, Klf, Zfp, Gli, Gabpa, Insm, Mzf, Zic
Extended analysis of locally enriched motifs
CentriMo’s simultaneous consideration of multiple sequences allows it to incorporate information about conservation that is unavailable to a single-sequence scanning tool like FIMO. CentriMo, however, considers only the highest-ranked match for a motif in each sequence. This constraint is unrestrictive with the large numbers of short sequences on which CentriMo is typically used. On the small number of long sequences in our analysis, however, it is far more likely to overlook conserved motifs. To counter this possibility, we next developed a method to combine FIMO’s consideration of all matches in a sequence with CentriMo’s simultaneity.
Using the reference nucleotide from the global multiple alignment, as described previously, we generated two additional sets of unmasked flanking sequences. The first set contained 1 kb sequences (100 bp 5’ of the reference base, and 900 bp 3’) from 17 species, while the second set consisted of 10 kb sequences (100 bp 5’ of the reference base, and 9.9 kb 3’) from 13 species (insufficient sequence was available for the ferret, goat, sheep, and tiger).
We independently analysed each sequence with FIMO, and processed the results to determine positions where a statistically unlikely number of sequences (see Methods) contained a match for the same motif within either disjoint or 50 % overlapping windows of 100 bp or 250 bp. This approach greatly lowered the burden of multiple tests compared to CentriMo, which considers all possible window sizes at all possible positions within tendered sequences. This approach to locating motifs differs from typical alignment-based approaches in two important ways. Firstly, it is targeted directly at conservation of motifs, not necessarily of the underlying sequence. Divergent sequences that retain the ability to bind a particular transcription factor will be captured by this method where they might be overlooked by sequence alignment. Secondly, the use of windows allows us to capture binding motif occurrences that have drifted as species diverged. Whereas sequence alignment approaches require a motif to be at the same position in multiple sequences, our approach requires only that a motif be within the same 100/250 bp window between different species.
Within the region previously analysed with CentriMo, this method found largely the same likely motifs at the same positions (Additional file 9). One result, however, both unreported and undetected by our earlier analyses, was a GATA-like motif 260 bp 3’ of the reference nucleotide that was present in human, mouse, and six additional species (Fig. 6b, Additional file 10). GATA4 is an essential factor in testis determination [22, 24, 25], but precise in vivo binding sites remain uncharacterised. This result provides a putative site suitable for further functional analysis. Several additional motif occurrences were also found at more distant positions (Fig. 6b, Additional file 10), but, unlike the more proximal motifs, were present only in 4–6 species (or fewer than half of the 13 available). None of these distal motifs was present in the mouse.
De novo motif discovery
De novo motifs in Sry flanking sequences have potential to bind known transcription factors
Database of best match
Top-ranked motif ID (based on q-value)
Factor binding motif
q-value (motif scan)
To establish the novelty of motifs reported by WeederH, we extended the WeederH-predicted sites with the five base pairs adjoining them at each end in the genomic sequence of the reference species, and scanned these extended motifs with FIMO (Table 3). FIMO found significant matches for all four motifs identified using the human reference. The first two motifs (both located in the SPCI) exhibited greater similarity to Sp1 and Oct/Sox binding sites, respectively, than to any other known transcription factor binding motifs, recapitulating earlier results. The remaining two motifs were found to best match the motif for NR2F1 (also known as COUP-TF1; MA0017.1 from the JASPAR database; q-value = 10−3) and an estrogen-related receptor motif (ESRRG_full_3 from the Jolma database, q-value = 10−3). The estrogen-related receptor motif provides a possible binding site for the known Sry regulator NR5A1, while the putative match for NR2F1 is interesting in light of the possible role of the related NR2F2 (also known as COUP-TFII) in gonad development .
The two motifs found using the mouse as a reference also had significant similarity with known motifs. The first was found to best match a motif for the onecut family of transcription factors (ONECUT2_DBD from the Jolma database; q-value = 6 × 10−3) (Table 3). Onecut factors play roles in C. elegans sex determination , but have not previously been implicated in the corresponding mammalian process. The second showed best agreement with a heat-shock motif (HSF1_full from the Jolma database; q-value = 3 × 10−2). Heat-shock proteins are known to play roles in spermatogenesis  in mice, and have also been found to be enriched in the testis of the swamp eel , but have no known role in sex determination.
Pairwise local alignment
Given the limited degree of conservation observed in the 10 kb adjacent to the XSS, we reasoned that conserved regulatory elements might instead lie in more distal positions. We first attempted to globally align longer flanking sequences, but observed that the majority of alignment tools were confounded by them, resulting in a variety of error conditions. With this obstacle in mind, and also taking into consideration the potential for rearrangement of the Y chromosome, we decided to instead search more distal positions for occurrences of local similarity. Using repeat-masked 100 kb intervals from the 5’ and 3’ flanks of Sry, we generated a series of pairwise local alignments between the human sequence and the corresponding sequence from each of five other species where extended flanking sequence was available.
We have applied a range of contemporary computational tools to the task of identifying conserved elements in the genomic sequence flanking the mammalian testis-determining gene Sry. In doing so, we have analysed flanking sequences an order of magnitude longer, and from a more diverse range of mammals, than has previously been possible for this gene. We took advantage of newly available Y chromosome sequences, which have historically been difficult to obtain, restricting earlier studies to smaller datasets, both in terms of species and sequence length.
Our initial global alignment of multiple repeat-masked sequences predicted, in a 10 kb region flanking the 5’ end of the gene, just one broadly conserved region, which we have termed the SPCI. A similar global alignment of the region flanking the 3’ end of the gene predicted no conserved regions at all. A variety of motif scanning techniques predicted conserved transcription factor binding sites as far as 5 kb upstream, but subsequent efforts to identify conserved sequences using local alignment tools returned only the SPCI, and made no further predictions of broad conservation, even when we considered sequence as distant as 100 kb upstream and downstream from Sry.
The Sry proximal conserved interval
Throughout our analyses, the SPCI has consistently been identified as the most conserved interval in either flank of Sry. The SPCI is approximately 106 bp long, overlaps the TSS in humans, and is adjacent to it in mice. Conservation of this region has been identified in multiple previous studies, including between humans and mice , and between multiple primates . Our analysis was unable to locate additional conserved regions in Sry’s flanking regions, but has demonstrated that the SPCI is more widely conserved than previously understood, and is found in the flanking region of Sry in diverse mammalian species (Fig. 5b). In fact, of the 17 species included in our multiple sequence alignment, the only two species to lack the SPCI were the sheep, where it has previously been reported as absent , and the ferret. Thus, our data highlight the broad extent of conservation of the SPCI in a region of the genome otherwise devoid of significant conservation.
Conservation of sequence suggests a conservation of function. In this regard it may be significant that we identified conserved motifs for transcription factors known to be important in gonad development within the SPCI, lending support to the notion that it is required for transcriptional regulation of Sry. The strongest conservation peak corresponds to an Sp1 motif (Fig. 5a, positions −330 to −320). Sp1 has been shown capable of regulating Sry in vitro , and previous studies have considered conservation of this region in a narrower range of species . Immediately 5’ to this is a somewhat conserved sequence that has previously been investigated in vitro in the context of WT1 binding, where it has been found essential for WT1-dependent activation of the mouse Sry promoter .
A second, highly conserved element in the SPCI (Fig. 5a, positions −403 to −390) has been noted previously as conserved between smaller groups of species [8, 12], but our results indicate it to be far more widely conserved than previously reported. We found this element to closely resemble the binding motif of an Oct/Sox dimer.
Curiously, though our interest in this element of the SPCI stems from its conservation, the only functional evidence associated with it is attributable to a unique pentanucleotide, GATAC, a consensus GATA binding site, which is present on the reverse strand in the mouse sequence but in no other species. In in vitro experiments, GATA4 is able to bind to this site and activate mouse Sry promoter constructs transfected into HeLa cells . In vivo experiments also implicate these bases, as they represent the only likely GATA4 binding site between the primer pairs of a ChIP-PCR assay that showed strong GATA4 binding at 11.5 dpc .
Locally enriched motifs
While the SPCI represented the only well conserved stretch of sequence in our results, we also found a number of transcription factor binding motifs locally enriched within narrow intervals of 100 bp or less. In the absence of broader conservation at the sequence level, this localised conservation of motifs may indicate functional elements common to multiple species. Perhaps the most interesting result in this regard is the presence of four separate positions within the 450 bp 5’ of the conserved Sp1 site where the DNA sequence is permissive in the majority of species for binding by Oct and/or Sox factors. Also of interest are two further sites, one for an estrogen-related receptor motif, and another, some 160 bp upstream of the human TSS, which our analysis identified as a motif for NR4A2. Both these motifs are similar to that of NR5A1 (also known as both SF-1 and Ad4BP), a known regulator of Sry [10, 32], suggesting possible binding sites for this factor (a specific NR5A1 motif was not identified by our analysis as it was not present in any of the available databases). As with the conserved sequences of the SPCI, it remains unclear what, if any, functional role these elements might play in Sry regulation.
Sry may not require distal regulatory elements
Our results indicate that, even between closely related species, conservation in the flanks of Sry is restricted to just a few kilobases 5’ of the TSS, with no discernible, widespread conservation in the 3’ flank, or in more distant parts of the 5’ flank. Our analysis, which included all available data for this genomic locus, used a number of analyses in an attempt to discern even low levels of conservation that might be present across the various available species. In addition to global alignment, we constructed a series of local alignments in the hope that these might reveal short, highly similar sequences, even if they were not at a consistent distance from the ORF across species. Contrary to our expectations, we instead conclusively demonstrated an absence of broadly conserved sequences. The only portions of sequence we did not analyse were the repeat regions masked by RepeatMasker . While there is increasing evidence at the genomic scale that repeat regions can harbour regulatory function [33, 34], it remains unclear how putative regulatory sites in repeat regions might be predicted in the context of a single gene such as Sry.
The position of murine Sry between inverted repeats  suggests that, in the mouse, the gene has been transposed to its current location at some point in the past from an entirely different position on the Y chromosome. Given this observation, we might speculate that, in the mouse at least, all sequence specific regulatory elements necessary for male sex determination lie not only within the 8 kb of L741, the construct first used to generate transgenic XX male mice , but also within the few hundred base pairs of unique non-coding sequence that lie between the arms of the repeat. Available evidence accords with this view, with all suspected binding sites for transcription factors regulating mouse Sry lying within this region .
Finally, it is worth noting the possibility that sequence-specific transcription factors are not the primary drivers of Sry regulation, and that other factors, such as DNA methylation [35, 36] and epigenetic modification [37, 38] play critical roles. Conservation-based approaches would provide little insight in clarifying how Sry is regulated in this case.
Mouse Sry as a model of human SRY regulation
While we compared 17 mammalian Sry sequences in this study, our analysis was anchored in mouse as the species in which sex-determination is currently best understood. It is pertinent to ask to what extent the study of mouse Sry regulation is likely to improve our understanding of the corresponding process in other mammals, and especially humans. Sry is strongly upregulated in XY gonads of both species during early development of the testes, and a variety of experiments have shown the potential for human SRY regulatory and coding sequences of Sry to function in mice (reviewed in ), implicating common factors in its regulation and downstream effects (eg. [22, 39]). Functional dissection of mouse putative regulatory elements followed by mutation analysis in undiagnosed cases of human XY gonadal dysgenesis is required to experimentally validate the utility of mouse as an experimental model for studying Sry regulation.
Several transcription factors are known to regulate Sry during sex determination, but specific binding sites for these (or any other) factors remain uncharacterised. Using sequences an order of magnitude longer than previously available, we have applied a range of computational analyses to the task of identifying conserved regulatory elements of Sry. These analyses highlight a short, well-conserved sequence, which we have dubbed the SPCI; and reveal the large differences that otherwise exist between the flanking regions of Sry in different species.
Our results suggest a new model whereby the testis-determining role of Sry depends solely on a combination of transcription factor binding to the SPCI and epigenetic regulatory mechanisms. Testing this model will require the targeting of specific transcription factor binding sites within the SPCI with a genome editing system such as CRISPR/Cas .
DNA sequences (Table 1) were obtained from a variety of sources. Bioconductor BSgenome packages were used to obtain flanking sequences for human, chimpanzee, mouse, and cow, as these species all have whole genome builds with significant Y chromosome sequences, and annotations for positions of Sry. We obtained flanking sequences from a supplementary FASTA file of a complete published Y chromosome sequence in macaque . The position of Sry within the chromosome was established by downloading the macaque Sry mRNA sequence from Genbank [Genbank:NM_001032836], and searching the chromosomal sequence for an exact match.
For other species, we first searched the Genbank CoreNucleotide database with the query string “sry”. This resulted in a selection of mRNA and RefSeq sequences for Sry coding and promoter regions. The coding regions from this initial set of sequences were then used as query sequences in BLASTN searches against available genomic databases (NCBI Genomes, High throughout genomic sequences). Where these searches located sequences with embedded coding regions, they were accepted as bona fide Sry sequences, downloaded, and manipulated with the Biostrings package to obtain flanking sequences.
DNA sequences were manipulated in R , using a variety of packages from Bioconductor . Where relevant sequences for a species were embedded in a whole genome build, flanking sequences were obtained using the relevant BSgenome package (Table 1). For all other species, FASTA files were downloaded from Genbank, and subsequently manipulated using the Biostrings package. The translation start site (XSS), rather than the TSS, was defined as position +1 in the coordinates for all sequences because the TSS is uncharacterised in some species.
Sequences were masked for repetitive elements using RepeatMasker . Default parameters were used except for the DNA source. A value for this parameter must be selected from a range of pre-defined species/orders. For each sequence, we selected the option most closely related to the species where the sequence originated (Additional file 5: Table S1).
Global sequence alignment and conservation
Sets of repeat-masked sequences were globally aligned by MUSCLE  using default parameters. Alignments were viewed, and minor adjustments made manually, using MEGA6 . We then used RPhast , an R package, to estimate conservation. The phyloFit function, with default parameters, was first used to estimate a neutral model from the alignment of Sry coding regions and an associated guide tree. This model was then used in conjunction with RPhast’s phastCons function to estimate conservation in the previously aligned flanking sequences.
Two separate programs from the MEME suite  were used to scan unmasked sequences for motif occurrences. Individual sequences were scanned using FIMO version 4.9.0  (with default parameters unless otherwise specified), while simultaneous scanning of multiple sequences used CentriMo 4.10.0  with the optional –local flag. This flag allows identification of enriched motifs at any position in the sequences. In all motif scanning experiments, sequences were scanned with 1270 motifs from the JASPAR , Jolma , and UniPROBE  databases.
The analysis of motif occurrences within positional windows used, for each set of input sequences, two separate invocations of FIMO. The first used default parameters, while the second used the parameter --thresh 5e-4, a tolerance slightly less stringent than the default value of 1 × 10−4, in order to make the search more comprehensive, at the risk of more false positives. Output from FIMO was subsequently processed in R. Results were divided, in two separate analyses, into 100 bp and 250 bp windows. For each of these window sizes, alternative analyses considered either disjoint windows or windows with a 50 % overlap (ie. 100 bp windows were overlapped by 50 bp, 250 bp windows were overlapped by 125 bp). In all cases, we counted the number of species in which each motif appeared in each window. We derived a p-value for each event by first defining the probability of a motif occurring at any single base pair in a single sequence as the maximum of the FIMO threshold parameter (either 1 × 10−4 or 5 × 10−4) and each of the empirically observed probabilities (observed occurrences divided by total available positions), and then modeling the probability of multiple matches in a window as a binomial distribution. False discovery rate was calculated using Bioconductor’s qvalue package. In summary, the windowing analysis determined the probability of a motif occurrence at an individual base pair in any single sequence, and then extrapolated this, by way of a binomial distribution, to the probability of a motif occurring in the same window in multiple species. Parameters varied in this analysis were the sequence lengths (1 kb or 10 kb), the FIMO tolerance (1 × 10−4 or 5 × 10−4), the window size (100 bp or 250 bp), and whether or not the windows overlapped (true or false). Results are provided for each of the sixteen possible permutations of these parameters (Additional files 9 and 10).
De novo motif discovery used WeederH  on unmasked sequences, as the program accepts only A, C, G, and T as input. Different species, mouse and human, were used as reference species in two separate experiments. A negative control was established for each experiment by running WeederH with the reference sequence and 100 random shufflings of the nucleotides within each of the non-reference sequences. False discovery rate was established by comparing scores from actual observations to scores from the 100 trials with random shuffling. The sequence motifs reported by WeederH were compared to known motifs by first extending the reported motifs (either eight or twelve bp in length) by five base pairs on both sides using the endogenous context of the motif in the genome of the reference species — either human (hg19) or mouse (mm9). These extended strings were then analysed with FIMO using the parameter --thresh 5e-4.
Local sequence alignment
Pairs of repeat-masked sequences were locally aligned with LALIGN  using default parameters. Output files from LALIGN were parsed and compared for overlapping local alignments using bespoke programs implemented in Racket  and R . From each set of overlapping regions (which could potentially involve multiple disparate sequences from a single species), a single sequence was chosen from each species so as to maximize the overall length of the alignment. These sequences were then globally aligned with MUSCLE as described previously.
Dot plots were generated by GEPARD , with command line options of –maxwidth 300 –maxheight 300 –matrix matrices\edna.mat –lower 33 –upper 67. Comparisons between species used masked sequences. Unmasked sequences were used in the comparison of mouse 5’ and 3’ flanks.
Figures of the probability of conservation were generated using ggplot2 .
This work was supported by a research grant from the National Institutes of Health [RO-1 RR021692-01]. CL is the recipient of an Australian Postgraduate Award and a University of Queensland Scholarship. PK is a Senior Principal Research Fellow of the NHMRC.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Koopman P, Gubbay J, Vivian N, Goodfellow P, Lovell-Badge R. Male development of chromosomally female mice transgenic for Sry. Nature. 1991;351:117–21.View ArticlePubMedGoogle Scholar
- Sekido R, Lovell-Badge R. Sex determination involves synergistic action of SRY and SF1 on a specific Sox9 enhancer. Nature. 2008;453:930–4.View ArticlePubMedGoogle Scholar
- Albrecht KH, Eicher EM. Evidence That Sry Is Expressed in Pre-Sertoli Cells and Sertoli and Granulosa Cells Have a Common Precursor. Dev Biol. 2001;240:92–107.View ArticlePubMedGoogle Scholar
- Bullejos M, Koopman P. Spatially dynamic expression of Sry in mouse genital ridges. Dev Dyn. 2001;221:201–5.View ArticlePubMedGoogle Scholar
- Jeske YWA, Mishina Y, Cohen DR, Behringer RR, Koopman P. Analysis of the role of Amh and Fra1 in the Sry regulatory pathway. Mol Reprod Dev. 1996;44:153–8.View ArticlePubMedGoogle Scholar
- Hiramatsu R, Matoba S, Kanai-Azuma M, Tsunekawa N, Katoh-Fukui Y, Kurohmaru M, et al. A Critical Time Window of Sry Action in Gonadal Sex Determination in Mice. Development. 2009;136:129–38.View ArticlePubMedGoogle Scholar
- Larney C, Bailey TL, Koopman P. Switching on sex: transcriptional regulation of the testis-determining gene Sry. Development. 2014;141:2195–205.PubMed CentralView ArticlePubMedGoogle Scholar
- Hacker A, Capel B, Goodfellow P, Lovell-Badge R. Expression of Sry, the Mouse Sex Determining Gene. Development. 1995;121:1603–14.PubMedGoogle Scholar
- Margarit E, Guillén A, Rebordosa C, Vidal-Taboada J, Sánchez M, Ballesta F, et al. Identification of Conserved Potentially Regulatory Sequences of the SRY Gene from 10 Different Species of Mammals. Biochem Biophys Res Commun. 1998;245:370–7.View ArticlePubMedGoogle Scholar
- Pilon N, Daneau I, Paradis V, Hamel F, Lussier JG, Viger RS, et al. Porcine SRY Promoter Is a Target for Steroidogenic Factor 1. Biol Reprod. 2003;68:1098–106.View ArticlePubMedGoogle Scholar
- Ross DG, Bowles J, Koopman P, Lehnert S. New insights into SRY regulation through identification of 5’ conserved sequences. BMC Mol Biol. 2008;9:85.PubMed CentralView ArticlePubMedGoogle Scholar
- Veitia RA, Fellous M, McElreavey K. Conservation of Y chromosome-specific sequences immediately 5′ to the testis determining gene in primates. Gene. 1997;199:63–70.View ArticlePubMedGoogle Scholar
- Hughes JF, Rozen S. Genomics and Genetics of Human and Primate Y Chromosomes. Annu Rev Genomics Hum Genet. 2012;13:83–108.View ArticlePubMedGoogle Scholar
- Hughes JF, Skaletsky H, Brown LG, Pyntikova T, Graves T, Fulton RS, et al. Strict evolutionary conservation followed rapid gene loss on human and rhesus Y chromosomes. Nature. 2012;483:82–6.PubMed CentralView ArticlePubMedGoogle Scholar
- Prokop JW, Underwood AC, Turner ME, Miller N, Pietrzak D, Scott S, et al. Analysis of Sry duplications on the Rattus norvegicus Y-chromosome. BMC Genomics. 2013;14:792.PubMed CentralView ArticlePubMedGoogle Scholar
- Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.PubMed CentralView ArticlePubMedGoogle Scholar
- Hubisz MJ, Pollard KS, Siepel A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform. 2011;12:41–51.PubMed CentralView ArticlePubMedGoogle Scholar
- Foster JW, Graves JA. An SRY-Related Sequence on the Marsupial X Chromosome: Implications for the Evolution of the Mammalian Testis-Determining Gene. Proc Natl Acad Sci. 1994;91:1927–31.PubMed CentralView ArticlePubMedGoogle Scholar
- Gubbay J, Vivian N, Economou A, Jackson D, Goodfellow P, Lovell-Badge R. Inverted Repeat Structure of the Sry Locus in Mice. Proc Natl Acad Sci. 1992;89:7953–7.PubMed CentralView ArticlePubMedGoogle Scholar
- Smit A, Hubley R, Green P. RepeatMasker Open-3.0. 1996.Google Scholar
- Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Miyamoto Y, Taniguchi H, Hamel F, Silversides DW, Viger RS. A GATA4/WT1 cooperation regulates transcription of genes required for mammalian sex determination and differentiation. BMC Mol Biol. 2008;9:44.PubMed CentralView ArticlePubMedGoogle Scholar
- Bailey TL, Machanick P. Inferring direct DNA binding from ChIP-seq. Nucleic Acids Res. 2012;40:e128.PubMed CentralView ArticlePubMedGoogle Scholar
- Tevosian SG, Albrecht KH, Crispino JD, Fujiwara Y, Eicher EM, Orkin SH. Gonadal Differentiation, Sex Determination and Normal Sry Expression in Mice Require Direct Interaction Between Transcription Partners GATA4 and FOG2. Development. 2002;129:4627–34.PubMedGoogle Scholar
- Gierl MS, Gruhn WH, von Seggern A, Maltry N, Niehrs C. GADD45G Functions in Male Sex Determination by Promoting p38 Signaling and Sry Expression. Dev Cell. 2012;23:1032–42.View ArticlePubMedGoogle Scholar
- Pavesi G, Zambelli F, Pesole G. WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences. BMC Bioinf. 2007;8:46.View ArticleGoogle Scholar
- Van den Driesche S, Walker M, McKinnell C, Scott HM, Eddie SL, Mitchell RT, et al. Proposed Role for COUP-TFII in Regulating Fetal Leydig Cell Steroidogenesis Perturbation of Which Leads to Masculinization Disorders in Rodents. PLoS ONE. 2012;7:e37064.PubMed CentralView ArticlePubMedGoogle Scholar
- Gladden JM, Meyer BJ. A ONECUT Homeodomain Protein Communicates X Chromosome Dose to Specify Caenorhabditis elegans Sexual Fate by Repressing a Sex Switch Gene. Genetics. 2007;177:1621–37.PubMed CentralView ArticlePubMedGoogle Scholar
- Åkerfelt M, Vihervaara A, Laiho A, Conter A, Christians ES, Sistonen L, et al. Heat Shock Transcription Factor 1 Localizes to Sex Chromatin during Meiotic Repression. J Biol Chem. 2010;285:34469–76.PubMed CentralView ArticlePubMedGoogle Scholar
- He Y, Luo M, Yi M, Sheng Y, Cheng Y, Zhou R, et al. Identification of a Testis-Enriched Heat Shock Protein and Fourteen Members of Hsp70 Family in the Swamp Eel. PLoS One. 2013;8:e65269.PubMed CentralView ArticlePubMedGoogle Scholar
- Desclozeaux M, Poulat F, de Santa BP, Soullier S, Jay P, Berta P, et al. Characterization of two Sp1 binding sites of the human sex determining SRY promoter. Biochim Biophys Acta BBA - Gene Struct Expr. 1998;1397:247–52.View ArticleGoogle Scholar
- De Santa BP, Méjean C, Moniot B, Malclès M-H, Berta P, Boizet-Bonhoure B. Steroidogenic Factor-1 Contributes to the Cyclic-Adenosine Monophosphate Down-Regulation of Human SRY Gene Expression. Biol Reprod. 2001;64:775–83.View ArticleGoogle Scholar
- Sawaya S, Bagshaw A, Buschiazzo E, Kumar P, Chowdhury S, Black MA, et al. Microsatellite Tandem Repeats Are Abundant in Human Promoters and Are Associated with Regulatory Elements. PLoS One. 2013;8:e54710.PubMed CentralView ArticlePubMedGoogle Scholar
- Yanez-Cuna JO, Arnold CD, Stampfel G, Boryn LM, Gerlach D, Rath M, et al. Dissection of thousands of cell type-specific enhancers identifies dinucleotide repeat motifs as general enhancer features. Genome Res. 2014;24(7):1147–56.PubMed CentralView ArticlePubMedGoogle Scholar
- Nishino K, Hattori N, Tanaka S, Shiota K. DNA Methylation-Mediated Control of Sry Gene Expression in Mouse Gonadal Development. J Biol Chem. 2004;279:22306–13.View ArticlePubMedGoogle Scholar
- Nishino K, Hattori N, Sato S, Arai Y, Tanaka S, Nagy A, et al. Non-CpG Methylation Occurs in the Regulatory Region of the Sry Gene. J Reprod Dev. 2011;57:586–93.View ArticlePubMedGoogle Scholar
- Katoh-Fukui Y, Miyabayashi K, Komatsu T, Owaki A, Baba T, Shima Y, et al. Cbx2, a Polycomb Group Gene, Is Required for Sry Gene Expression in Mice. Endocrinology. 2012;153:913–24.View ArticlePubMedGoogle Scholar
- Kuroki S, Matoba S, Akiyoshi M, Matsumura Y, Miyachi H, Mise N, et al. Epigenetic Regulation of Mouse Sex Determination by the Histone Demethylase Jmjd1a. Science. 2013;341:1106–9.View ArticlePubMedGoogle Scholar
- Hossain A, Saunders GF. The Human Sex-Determining Gene SRY Is a Direct Target of WT1. J Biol Chem. 2001;276:16817–23.View ArticlePubMedGoogle Scholar
- Wang H, Yang H, Shivalila CS, Dawlaty MM, Cheng AW, Zhang F, et al. One-Step Generation of Mice Carrying Mutations in Multiple Genes by CRISPR/Cas-Mediated Genome Engineering. Cell. 2013;153:910–8.PubMed CentralView ArticlePubMedGoogle Scholar
- R Core Team. R: A Language and Environment for Statistical Computing. 2014.Google Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80.PubMed CentralView ArticlePubMedGoogle Scholar
- Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular Evolutionary Genetics Analysis Version 6.0. Mol Biol Evol. 2013;30:2725–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37(Web Server):W202–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 2014;42:D142–7.PubMed CentralView ArticlePubMedGoogle Scholar
- Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas P, et al. DNA-Binding Specificities of Human Transcription Factors. Cell. 2013;152:327–39.View ArticlePubMedGoogle Scholar
- Newburger DE, Bulyk ML. UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res. 2009;37(Database issue):D77–82.PubMed CentralView ArticlePubMedGoogle Scholar
- McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, et al. Analysis Tool Web Services from the EMBL-EBI. Nucleic Acids Res. 2013;41:W597–600.PubMed CentralView ArticlePubMedGoogle Scholar
- Flatt M, PLT: Reference: Racket. PLT Design Inc.; 2010. http://racket-lang.org/tr1/ Accessed 5 Oct 2015.
- Krumsiek J, Arnold R, Rattei T. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics. 2007;23:1026–8.View ArticlePubMedGoogle Scholar
- Wickham H: ggplot2: Elegant Graphics for Data Analysis. Springer New York; 2009. https://cran.r-project.org/web/packages/ggplot2/citation.html.
- Ge R-L, Cai Q, Shen Y-Y, San A, Ma L, Zhang Y, et al. Draft genome sequence of the Tibetan antelope. Nat Commun. 2013;4:1858.PubMed CentralView ArticlePubMedGoogle Scholar
- Seim I, Fang X, Xiong Z, Lobanov AV, Huang Z, Ma S, et al. Genome analysis reveals insights into physiology and longevity of the Brandt’s bat Myotis brandtii. Nat Commun. 2013;4:2212.PubMed CentralView ArticlePubMedGoogle Scholar
- Hughes JF, Skaletsky H, Pyntikova T, Minx PJ, Graves T, Rozen S, et al. Conservation of Y-linked genes during human evolution revealed by comparative sequencing in chimpanzee. Nature. 2005;437:100–3.View ArticlePubMedGoogle Scholar
- Elsik CG, Tellam RL, Worley KC. The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution. Science. 2009;324:522–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Skaletsky H, Kuroda-Kawaguchi T, Minx PJ, Cordum HS, Hillier L, Brown LG, et al. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature. 2003;423:825–37.View ArticlePubMedGoogle Scholar
- Chinwalla AT, Cook LL, Delehaunty KD, Fewell GA, Fulton LA, Fulton RS, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–62.View ArticlePubMedGoogle Scholar
- Groenen MAM, Archibald AL, Uenishi H, Tuggle CK, Takeuchi Y, Rothschild MF, et al. Analyses of pig genomes provide insight into porcine demography and evolution. Nature. 2012;491:393–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Geraldes A, Rambo T, Wing RA, Ferrand N, Nachman MW. Extensive Gene Conversion Drives the Concerted Evolution of Paralogous Copies of the SRY Gene in European Rabbits. Mol Biol Evol. 2010;27:2437–40.PubMed CentralView ArticlePubMedGoogle Scholar
- Payen EJ, Cotinot CY. Sequence evolution of SRY gene within Bovidae family. Mamm Genome Off J Int Mamm Genome Soc. 1994;5:723–5.View ArticleGoogle Scholar
- Cho YS, Hu L, Hou H, Lee H, Xu J, Kwon S, et al. The tiger genome and comparative analysis with lion and snow leopard genomes. Nat Commun. 2013;4:2433.PubMed CentralPubMedGoogle Scholar
- Yim H-S, Cho YS, Guang X, Kang SG, Jeong J-Y, Cha S-S, et al. Minke whale genome and aquatic adaptation in cetaceans. Nat Genet. 2014;46:88–92.PubMed CentralView ArticlePubMedGoogle Scholar