{literal} {/literal}
The purpose of the Uprobe project is to provide the community with an efficient means of experimental access to large-insert cloned DNA (ie, BAC clones) from the full spectrum of vertebrate genomes for which genomic BAC libraries are planned or are currently available. This is being accomplished by technology development aimed at improving the ability of 'universal' hybridization-based probes to identify genes/regions of interest from multiple species, and then the dissemination of these improved technologies through the creation of:
A more general introduction to the Uprobe project and details related to the above mentioned resources are provided below.
Also note that you can also download whole-genome probe sets , computer programs , and experimental protocols from this website.
Pre-computed whole-genome probe sets are available for:
All Mammals.
AUG_2003_b1 is based on human-mouse alignments and has been experimentally validated.
OCT_2003_b2 is based on human-mouse-rat alignments and has been experimentally validated.
FEB_2004_mammals_1 was created by combining the approaches used to build the first two probe sets and has been experimentally validated, and is the current recommended and default probe set for screening mammalian genomic libraries.
JUN_2005_mammals_2 was created by enhancing the FEB_2004_mammals_1 with new probes based on human-mouse-rat-dog-chicken alignments and is the current recommended and default probe set for screening mammalian genomic libraries.
Rodents.
APR_2005_rodents_1.1 was designed with a new algorithm, nsoop_v2, using mouse-rat-human-dog alignments specifically for screening rodent libraries and is currently in the process of being experimentally validated. This set replaces OCT_2004_rodents_1.
Carnivores.
APR_2005_carnivores_1.1 was designed with a new algorithm, nsoop_v2, using dog-human-mouse-rat alignments specifically for screening rodent libraries and has been experimentally validated. This set replaces JAN_2005_carnivores_1.
Marsupials.
JUN_2005_marsupials_1 was designed from human-opossum alignments and is the recommended probe set for screening marsupial libraries.
All Birds and Reptiles.
MAR_2004_birds/reptiles_1 is based on chicken-human alignments and has been experimentally validated.
On demand universal probe design for nonhuman primates:
On demand universal probe design is now available for Apes and Old world monkeys , New world monkeys, Simians, and All primates.
Custom universal probe design:
Custom universal probe design can be performed on DNA sequence alignments of 2 or more species
provided by the user. A step-by-step tutorial for the custom probe design process is provided here.
The Uprobe project has been supported by past funding from the NIH (R24RR022239 and U01MH068185). Comments are welcome and
can be directed to James Thomas (jthomas@genetics.emory.edu). Below is an overview of goals and rationale behind the uprobe project, a description of what universal probes are,
how they are designed, details on specific whole-genome probe sets and updates
and changes to the uprobe site. Comparative
sequencing is a particularly powerful tool for inferring function from genomic
sequence. Thus, sequencing the same region in multiple species simultaneously
would provide a valuable method for interpreting all genomic sequence as it is
generated. Just as important is access to cloned DNA containing orthologous genes and regions in multiple species
catalogued in the sequence. Such access would establish a new resource that
could be used for comprehensive functional analysis of both coding and
non-coding sequence. This new resource would consist of a series of gene
alleles generated not by mutagenesis within a species, but by the divergence of
sequence over evolutionary time (ie 'evolutionary
alleles'). These 'evolutionary alleles' could then be used to experimentally
dissect the function of genomic sequence in cell culture or a transgenic model.
At this time, there is no published method for supplying the templates
necessary for this type of sequencing or functional studies. Since multiple
species sequence comparisons and physical mapping among vertebrates will be a
tremendous resource for the functional annotation of the human genome and a
starting point for experimental analysis, robust and effective means of
generating such clone resources in a targeted manner would be of immeasurable
value to the research community. BAC
libraries offer the means to selectively isolate a specific region of a genome
or to support whole-genome mapping and sequencing efforts. As such, both the
NHGRI (RFA-HG-01-002) and NSF (NSF 01-145) have made funding commitments to
establish a large resource of BAC libraries from a diverse set of species. In
fact, nearly 100 vertebrate BAC libraries are currently available to the public
(Fig. 1) . This new BAC library resource will provide
a source for comparative sequencing and functional studies across a wide range
of vertebrates. Critical to the future utility of these BAC libraries will be
the availability of efficient and reliable methods for screening these
libraries and assembling high-quality BAC maps that can be used by the entire
biomedical research community. This is especially important for individual
researchers who do not necessarily have the experience, technology or
production needs of a genome center. Species-specific
sequence resources traditionally used for isolating and constructing physical
maps of regions of interest are not available for most vertebrates. That is, ESTs or other random genomic sequences. Thus, the
traditional route of building physical maps for a region of interest using
species-specific markers would not be possible for many of the species shown in
Fig 1. The goal of the first phase of the Uprobe
project is to combine the principles of traditional comparative mapping with
existing methods for screening BAC libraries to develop an efficient, practical
and reliable experimental strategy for assembling BAC maps from a diverse set
of vertebrates. Specifically, this proposal aims to remove the limitation of
species-specific resources necessary for BAC library screening by the design
and testing of universal overgo probes that can be
used on single or multiple BAC libraries and will be accomplished by
identifying evolutionarily conserved sequences between species such as human
and mouse, for which there is genomic sequence available. As a result, this
would provide an experimental and computational infrastructure aimed at the one
essential part of BAC library screening not yet standardized, probe design. The
strategy and methodologies proposed here would also dramatically reduce the
cost of building physical maps by increasing the potential mapping throughput
and decreasing the cost of marker reagents while maximizing the ability to
compare genomic maps of divergent species. Through the establishment of a
public database of universal probes, individual researchers would have a key
resource necessary to construct physical maps (independent of whole-genome
efforts) in their region and species of interest that would otherwise be
difficult to build. Small clone-based physical maps assembled by individual
researchers would also complement whole-genome efforts through the direct
integration of both types of maps via the specific clones of interest.
Therefore, it is anticipated that this scalable methodology would greatly
facilitate the use of future BAC libraries by individual laboratories and
genome centers alike, and thus be a widespread means of using the power of
comparative genomics for both community's specific research goals. The ongoing goals of
the Uprobe project are as follows: Modern genomic tools and resources will be critical for
ongoing and future nonhuman primate research and whole-genome sequencing
efforts are underway for a limited number of nonhuman primates. Unfortunately,
because of the sequencing strategies employed and associated costs, nonhuman
primate genome assemblies will have hundreds-of-thousands of gaps and will not
provide a definitive reference sequence like the finished human genome. Moreover,
whole-genome sequences will not yield direct access to the experimental tools
necessary to exploit this extensive genetic information. Bacterial artificial
chromosome (BAC) libraries and clones are a proven and valuable genomic
resource for the experimental utilization and functional characterization of
genomic sequence and are currently available for eighteen species of nonhuman
primates. Input from representatives of the nonhuman primate and biomedical
research communities revealed strong support for a resource to facilitate
access to these nonhuman primate genomic libraries (see appendix). The goal of
this proposal is to develop a resource that will provide an effective and
reliable means for the primate and biomedical research communities to isolate any
specific gene or region of interest from one or all nonhuman primate BAC
libraries. To do so, a web-based tool will be developed for the custom design
of universal hybridization probes that can be used for the isolation of nonhuman
primate BAC clones. Universal hybridization probes are a proven technology for
the efficient targeted isolation of BAC clones from multiple species in
parallel. However, this methodology has not been optimized for use in nonhuman
primates. This proposal will establish those optimal parameters and provide
them as preset values for the custom design of nonhuman primate universal
probes by the public. This custom universal hybridization probe design website
will therefore facilitate access to the full spectrum of genetic diversity
captured within all current and future nonhuman primate genomic libraries
independent of, and as a complement to, whole-genome sequencing efforts. As a
result, this proposal will yield an important avenue by which individual researchers
can readily import nonhuman primate genomic clones into their own laboratories
and experimental paradigms. The aims of this resource proposal are: Aim
1. Develop a public resource
for the custom design of universal hybridization probes for isolating nonhuman
primate genomic clones. A robust universal probe design pipeline used
previously in the creation of whole-genome probe sets will be adapted for the
custom and on demand design of universal probes for isolating nonhuman primate
genomic clones from specific genes and regions of interest. A web-based
interface to this custom probe design pipeline will allow the public to design
probes from a series of default settings catered to the efficient isolation of
genomic clones from four specific clades of primates
(1. all primates, 2. simians, 3. new world monkeys and 4. old world monkeys and
apes), or to design probes from their own sequence data. Aim
2. Experimentally validate
the nonhuman primate custom probe design resource. Small sets of universal
probes designed from each of the four ‘default’ nonhuman primate universal
probe design options will be selected for use in a small-scale targeted
comparative mapping and sequencing project for experimental validation of the
resource, and to provide a real world example of how to use the resource and the
data it can produce. What
are universal probes? The
concept behind the use of universal probes is very simple. If a sequence is
conserved between two divergent species, then it is likely to be conserved in
other species as well. For example, if a sequence is conserved between human
and chicken, then it is likely it will be conserved among all mammals and all
birds. Thus, a single sequence can act as an effective probe for screening
genomic BAC libraries from many birds and mammals and alleviate the need for
generating species-specific probes for every genomic library to be screened. In
addition, since a single probe is used to screen multiple species, screening of
BAC libraries can be done in parallel with identical hybridization and washing
conditions. In the preliminary data for this project, universal probes designed
for screening placental BAC libraries were designed based on sequence
similarity between human and mouse sequence. These probes were tested and found
be effective at isolating BAC clones from a set of placental mammals (cat, dog,
cow, pig, rat, baboon and chimpanzee) (Thomas et al, Parallel Construction of Orthologous Sequence-Ready Clone Contig
Maps in Multiple Species. Genome Res, 2002.
12:1277-1285). The
specifics of the process are illustrated in the figure below. The probes
themselves are called 'overgo' probes and are
comprised of two complementary 22-mers that overlap by 8-bp and are
radioactively labeled with a klenow fill-in reaction
with dATP and dCTP. Overgo probes were developed by John McPherson and have
been used extensively by large and small labs alike to screen genomic
libraries. The specificity and uniform design parameters allows one to
hybridize groups of probes together. Because primers are cheap and the
radioactivity used for labeling a given probe is minimal, we strongly recommend
(when feasible) the use of multiple probes from a given region be used to for
library screening versus a single probe. We aim to design probes that will have
at least a 50% chance of success in a given species,
therefore by using multiple probes, the likelihood of identifying clones of
interest with one hybridization is maximized. When large regions are targeted
for isolation, spacing of the probes every ~30 kb has proven very successful.
This basic process is being used to design whole-genome probe sets for clusters
of species, such as placental mammals, birds, reptiles and sub-groups of fish
using whole-genome alignments. Figure 2. Strategy for designing
universal overgo hybridization probes based on human-mouse
sequence alignments. Orthologous human and mouse genomic sequences are masked
for repetitive elements (indicated by X's) and then aligned. Regions with high
sequence conservation (indicated by vertical lines) are identified and used for
designing probes. When possible, a single 36-bp human sequence from each
alignment is chosen based on GC content and percent human-mouse sequence
identity. A subset of these sequences is then chosen to optimize for
inter-probe spacing (~30-40 kb). Three such conserved sequences are depicted in
the figure, with greater details provided (in the box) for the middle one. At
this stage, each selected 36-bp sequence is compared to all available human
genomic sequence to confirm that it is single copy. Overlapping pairs of oligonucleotide primers are then synthesized for each
sequence and used to generate double-stranded, radiolabeled
(indicated *'s) probes. The probes across a target region(s) are then pooled
and used to screen arrayed BAC libraries, allowing the isolation of individual
positive BACs. How the AUG_2003_b1 whole-genome probe set for
screening mammalian libraries was generated. Whole-genome human-mouse alignments (axtTight)
(Schwartz et al, Human-mouse alignments with BLASTZ. Genome Res 2003,
13:103-107 and Watertson et al, Initial sequencing
and comparative analysis of the mouse genome. Nature 2003, 420:520-562.) between the April 2003 assembly (UCSC version hg15) of the
human genome and the Feb. 2003 build (UCSC version mm3) assembly of the mouse
genome were downloaded from http://genome.ucsc.edu.
This alignment file was then modified to for use with a modified algorithm for
probe design, soop. Common repetitive sequences were
masked in the file and then, when possible, 1 candidate probe with >88%
human-mouse sequence identity from each ungapped
alignment was designed based on the human sequence. These candidate probes were
then compared to the April 2003 human genome build by megablast (megablast
-t 16 -N 2 -W 11 -e 0.6 -F F -D 3). The megablast
output was used to confirm the location of the probes in the human genome, and
tag each probe as 'unique' or 'non-unique'. Unique probes
had a single identical hit to the human genome assembly, no other hits with a bit
score above >40 and fewer than 5 hits with a score above 36. These are very
stringent criteria for calling a probe unique, and we feel that unique
probes, to the best of our knowledge, represent single-copy sequences in the
human genome and should be well suited for screening BAC libraries from other
mammals. Non-unique probes also had one identical hit to the human
assembly at the expected location, but had at least one other hit above 40 bits
or 5 hits above 36 bits. While the non-unique probes are not single-copy
in the human genome based on our criteria, we have kept them in our database
for potential use in regions of the human genome that are duplicated or for
isolating genes within a gene family. We do not recommend the use of non-unique
probes unless there is no alternative unique probe available. To
increase the number of unique probes, after masking just the non-unique
probe sequences in the human-mouse alignment file, candidate probes were then
designed only from alignments that yielded a non-unique probe.
Candidate probes were again compared to the human genome by megablast
and designated unique or non-unique. This recursive process was
repeated 2 times to yield 139,272 unique probes and 97,721 non-unique
probes. Figure 3.
Results of the experimental validation of a sample set of AUG_2003_b1 universal
probes. To test the efficiency of the
mammalian whole-genome probe set, AUG_2003_b1, n=48 probes were selected from
n=7 regions of the human genome for screening the marmoset (CHORI-259), galago (CHORI-256), rabbit (LBNL-1), bat (VMRC-7), shrew (SA_Ba), armadillo (VMRC-5), wallaby (ME_KBa),
and platypus (OA_Bb) BAC libraries. After primary and
secondary screens, probe-content information was merged with restriction-enzyme
fingerprint content maps. Based on this information, the success rate (the
fraction of probes tested that were positive for at least one BAC clone) in
each species was calculated, and is shown above. The distribution of the probes
percentage identity between human and mouse was slightly more enriched for
higher id probes than the content of the whole-genome set of unique probes (21%
versus 16% at 100% id, 19% versus 18% at 97% id, 19% versus 21% at 94% id, 23%
versus 22% at 91% id, and 19% versus 23% at 88% id). However, because optimal
physical spacing will greatly enhance the selected probes toward higher percent
identity, we believe this sample set reflects an accurate measurement of the
effectiveness of the unique whole-genome probe set. Representative clones have
been sent to the NIH
Intramural Sequencing
Center for sequencing to
confirm probe specificity.
Rationale


|
Table 2. Summary of Universal Probes in AUG_2003_b1 (Mammalian) |
|||
|
|
|
|
|
|
Human Chromosome |
Length w/o gaps (bp) |
Unique Probes |
Non-Unique Probes |
|
Chr1 |
218712898 |
12,156 |
9,401 |
|
Chr2 |
237043677 |
13,752 |
8,883 |
|
Chr3 |
193607233 |
9,624 |
6,298 |
|
Chr4 |
186580523 |
6,748 |
4,411 |
|
Chr5 |
177524972 |
9,125 |
5,989 |
|
Chr6 |
166880541 |
6,465 |
5,533 |
|
Chr7 |
154546299 |
6,675 |
4,770 |
|
Chr8 |
141694337 |
5,573 |
3,724 |
|
Chr9 |
115187719 |
6,151 |
4,501 |
|
Chr10 |
130710874 |
6,501 |
4,143 |
|
Chr11 |
130709420 |
7,735 |
5,451 |
|
Chr12 |
129328334 |
5,491 |
3,994 |
|
Chr13 |
95511656 |
3,512 |
2,462 |
|
Chr14 |
87191216 |
5,017 |
3,338 |
|
Chr15 |
81117055 |
4,943 |
3,929 |
|
Chr16 |
79890795 |
4,568 |
3,013 |
|
Chr17 |
77480855 |
5,655 |
4,433 |
|
Chr18 |
74534531 |
3,474 |
1,967 |
|
Chr19 |
55780860 |
1,970 |
1,596 |
|
Chr20 |
59424990 |
3,150 |
2,008 |
|
Chr21 |
33924747 |
813 |
713 |
|
Chr22 |
34352072 |
1,100 |
1,071 |
|
ChrX |
147686666 |
8,148 |
6,529 |
|
ChrY |
22761097 |
10 |
480 |
|
|
|
|
|
|
Total |
2832199938 bp |
138,356 |
98,637 |
OCT_2003_b2 Mammalian Whole-Genome Probe set.
A sample set of orthologous genomic sequences from human, mouse, rat, dog, cat, cow and pig were used to empirically optimize the universal probe design process using human-mouse-rat alignments. n=2863 36-bp probe sequences with n=7 or fewer mismatches between human and mouse, and for which rat, dog, cat, cow and pig sequences were also available were used as the basis of this process. For each substitution pattern between human-mouse-rat
(ie, human AAAAA
mouse AATTT
rat ATATC
pattern 12345)
a 'weight' was assigned based on the calculated percent identity for each pattern between the human nucleotide and the corresponding dog, cat, cow and pig nucleotide. The calculated values were:
sum of identical bases (dog, cat, cow,pig)/(total number of bases of Pattern# X 4)
Pattern 1=(67135+67461+66253+66623) /(72680X4)=0.9200
Pattern 2=(2472+2498+2399+2441) /(3003X4)=0.8167
Pattern 3=(1258+1263+1225+1215) /(1526X4)=0.8127
Pattern 4=(3476+3521+3399+3467) /(5885X4)=0.5889
Pattern 5=(318+330+304+310) /(501X4)=0.6297
A score was calculated for each probe by counting the patterns and then summing the corresponding 'weights'. Because rat and mouse are essentially equivalent distances from human, and only 0.004 separated the values for patterns 2 and 3, a single value, 0.8147 was used for both those patterns. The correlation of the probe scores and number of mismatches per probe in dog, cat, cow and pig was then calculated and compared to the correlation coefficient using a probe score based solely on the number of mismatches between human (probe) sequence and the mouse sequence. The correlation coefficient for the mouse mismatch score alone was n=0.5425073 and for the new matrix, n=0.564635, indicating that adding the rat sequence and using this matrix does provide a better basis for designing universal mammalian probes. While the increase in the correlation is not large, this basic scoring matrix strategy can be used with larger numbers and/or more informative combinations of species (such as human-mouse-dog).
The second major change to the probe design process was the selection of all probes that fell within the 0.44-0.56% GC range and met the minimum scoring requirement. In the previous build, only the 'best' probe was selected for each gap-free alignment between human and mouse. To provide the maximum number of probe options, we eliminated the 'best' criteria and now include all sequences that meet the set probe criteria.
This new algorithm was applied to the Multiz human-mouse-rat whole genome alignment (generated by W. Miller and J. Kent,(Blanchette et al. 2004. Aligning multiple geneomic sequences with the threaded block aligner. Genome Res 14:708-715. RGSPC. 2004. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 420:520-562) Human (UCSC hg15), Mouse (UCSC mm3), Rat (UCSC rn2), downloaded from the http://www.genome.ucsc.edu). Based on analysis of our test data set, a probe score cutoff value of 31.83 was determined to be more stringent than the 4 or fewer mismatches between human and mouse used in Aug_2003_b1 (ie, would include greater than 93% of all probes with 0,1,2,or 3 mismatches between human and mouse and exclude 60% of all probes with 4 mismatches between human and mouse). In addition, we have also edited the probe set to remove a small fraction of candidate sequences (<5%) that had properties that might compromise their general utility
(ie, a sequence that looked like this:GGCCGGGGGCCGCCCGGATATTATTTATAATATAT).
Specifically, probes with a gc_score (see soop.pl algorithm, OM40 (McPherson)) above 55.36.

Figure 4. Results of the experimental validation of a sample set of OCT_2003_b2 universal probes.
To test the efficiency of the mammalian whole-genome probe set, AUG_2003_b1, n=48 probes were selected from n=11 regions of the human genome for screening the marmoset (CHORI-259), galago (CHORI-256), rabbit (LBNL-1), bat (VMRC-7), shrew (SA_Ba), armadillo (VMRC-5), elephant (VMRC-15), wallaby (ME_KBa), and platypus (OA_Bb) BAC libraries. After primary and secondary screens, probe-content information was merged with restriction-enzyme fingerprint content maps. Based on this information, the success rate (the fraction of probes tested that were positive for at least one BAC clone) in each species was calculated, and is shown above. The test set of probes was selected to be an accurate sampling of the entire probe set (in terms of probe score). N Representative clones have been sent to the NIH Intramural Sequencing Center for sequencing to confirm probe specificity.
A numerical summary of this probe build is listed below.
|
Table 3. Summary of Universal Probes in OCT_2003_b2 (Mammalian) |
||||
|
|
|
|
|
|
|
Human Chromosome |
Length w/o gaps (bp) |
Unique Probes |
Non-Unique Probes |
|
|
Chr1 |
218712898 |
457,441 |
268,561 |
|
|
Chr2 |
237043677 |
420,924 |
222,490 |
|
|
Chr3 |
193607233 |
318,413 |
162,270 |
|
|
Chr4 |
186580523 |
198,668 |
103,338 |
|
|
Chr5 |
177524972 |
270,796 |
149,901 |
|
|
Chr6 |
166880541 |
213,399 |
139,525 |
|
|
Chr7 |
154546299 |
212,745 |
122,716 |
|
|
Chr8 |
141694337 |
179,965 |
91,364 |
|
|
Chr9 |
115187719 |
221,804 |
124,513 |
|
|
Chr10 |
130710874 |
208,993 |
105,613 |
|
|
Chr11 |
130709420 |
286,737 |
159,738 |
|
|
Chr12 |
129328334 |
211,643 |
124,001 |
|
|
Chr13 |
95511656 |
103,488 |
58,366 |
|
|
Chr14 |
87191216 |
178,789 |
100,094 |
|
|
Chr15 |
81117055 |
189,412 |
108,324 |
|
|
Chr16 |
79890795 |
159,073 |
91,888 |
|
|
Chr17 |
77480855 |
237,188 |
144,407 |
|
|
Chr18 |
74534531 |
99,938 |
48,999 |
|
|
Chr19 |
55780860 |
84,060 |
53,058 |
|
|
Chr20 |
59424990 |
112,647 |
58,712 |
|
|
Chr21 |
33924747 |
34,475 |
20,460 |
|
|
Chr22 |
34352072 |
61,558 |
39,617 |
|
|
ChrX |
147686666 |
224,952 |
149,438 |
|
|
ChrY |
22761097 |
112 |
8,855 |
|
|
|
|
|
|
|
|
Total |
2832199938 bp |
4,687,220 |
2,656,248 |
|
|
Table 4. Summary of probe scores for OCT_2003_b2. |
||
|
Score |
Unique Probes |
Non-Unique Probes |
|
31.83 |
18498 |
12059 |
|
31.84 |
55823 |
31366 |
|
31.85 |
47247 |
27331 |
|
31.86 |
243 |
144 |
|
31.87 |
3523 |
2334 |
|
31.88 |
4897 |
2868 |
|
31.89 |
4049 |
2336 |
|
31.91 |
211 |
159 |
|
31.92 |
344500 |
196685 |
|
31.93 |
41675 |
25794 |
|
31.95 |
1548 |
1296 |
|
31.96 |
68556 |
39478 |
|
31.97 |
7857 |
4694 |
|
31.99 |
227 |
214 |
|
32 |
5452 |
3309 |
|
32.01 |
330 |
223 |
|
32.02 |
436743 |
245136 |
|
32.04 |
88869 |
53423 |
|
32.05 |
4183 |
3044 |
|
32.06 |
73638 |
42661 |
|
32.07 |
301 |
165 |
|
32.08 |
15043 |
8745 |
|
32.09 |
423 |
326 |
|
32.1 |
4859 |
2865 |
|
32.12 |
621 |
383 |
|
32.13 |
351825 |
194616 |
|
32.14 |
177216 |
104856 |
|
32.16 |
10215 |
6712 |
|
32.17 |
45930 |
26272 |
|
32.18 |
25930 |
15562 |
|
32.2 |
1009 |
622 |
|
32.21 |
2586 |
1616 |
|
32.22 |
1071 |
670 |
|
32.25 |
300994 |
174414 |
|
32.26 |
23845 |
15111 |
|
32.28 |
761 |
490 |
|
32.29 |
38227 |
22927 |
|
32.3 |
2206 |
1559 |
|
32.33 |
1384 |
943 |
|
32.35 |
402291 |
230841 |
|
32.37 |
54886 |
32709 |
|
32.38 |
1628 |
1082 |
|
32.39 |
42745 |
26130 |
|
32.41 |
4615 |
2955 |
|
32.43 |
1412 |
909 |
|
32.46 |
365954 |
200052 |
|
32.47 |
111929 |
67463 |
|
32.49 |
3715 |
2472 |
|
32.5 |
28042 |
16659 |
|
32.51 |
7772 |
5055 |
|
32.54 |
754 |
538 |
|
32.58 |
198871 |
119827 |
|
32.59 |
8568 |
5685 |
|
32.62 |
12207 |
7936 |
|
32.68 |
295510 |
169997 |
|
32.7 |
19308 |
12452 |
|
32.72 |
14440 |
8404 |
|
32.79 |
338178 |
176621 |
|
32.8 |
41974 |
25355 |
|
32.83 |
10312 |
5912 |
|
32.91 |
82401 |
48641 |
|
33.01 |
150224 |
81423 |
|
33.12 |
276969 |
133722 |
|
|
|
|
|
Total |
4687220 |
2656248 |
A general translation of the scores of this build versus build 1 as determined by our test data set is:
Build 1 Build2
100% 33.05+/-0.14
97% 32.71+/-0.24
94% 32.42+/-0.21
91% 32.12+/-0.22
88% 31.82+/-0.23
Updates_Changes: 12/2003-03/2004.
At the end of 2003 and the beginning of 2004 a substantial number of changes were made to uprobe.
1. Modification of the algorithm used for optimal spacing of the probes.
In the first edition of the probe spacing algorithm used on uprobe (based on the original soop program), if a probe could not be found within a range corresponding to 0.5-1.5 the optimal spacing distance (osd), the first probe beyond this interval was selected to be the next probe. In this case, the selection process was sub-optimal for selecting the next best probe. Therefore, we have modified the spacing algorithm on uprobe to select the best probe beyond 1.5 X osd within a range equivalent to 0.15 X osd. The result of this change is an enhancement in selecting the best probes in all cases without a significant sacrifice in probe spacing.
2. Modification of original soop algorithm and sooper.xml.
In the original soop algorithm, there was a bug such that no probes were designed from the last ungapped alignment within a gapped alignment. This resulted in the design of a lower number probes than was possible and was particularly problematic when there was only 1 ungapped alignment (ie, the first ungapped alignment was also the last ungapped alignment, thus no probes were designed). This bug was corrected in both the original soop.pl and sooper.xml. Corrected versions are available for download on uprobe.
3. AUG_2003_b1 and OCT_2003_b2 probe corrections.
A bug was detected in the algorithm that is used to determine whether or not a probe is unique. Specifically, in cases where 2 or more identical matches were identified by megablast on the same chromosome as the probe was derived from, these extra matches were not included in the determination of uniqueness. All probes in both builds were re-evaluated to take this discrepancy into account and have been corrected in both the query database and bulk download files. The fraction of probes that were affected was ~0.5%. In addition, a few probes (n=124) did not have the correct location in OCT_2003_b2. These have been corrected. The whole database download files for AUG_2003_b1 and OCT_2003_b2 did not have the correct distance to next probe value. This has been corrected.
4. Modification of query interface.
Changes to search interface include:
Default is now FEB_2004_mammals_1 with spacing of 30 kb.
Exact parameters from last query are displayed after a search.
A reset button has been added that returns the query options to the default.
"Search for" on query page now does not recognize comma delimited queries.
Sets of probes >20,000 can not be viewed efficiently at this time on the UCSC browser.
Sets of probes >65,534 can not be downloaded at this time. For very large downloads, we suggest using the files provided on the download page that include all overgos for a given probe set.
When a query returns >1000 probes, only the first 1000 are displayed on the page. All probes are included in the download files (except if the download limit is exceeded).
New probe sets:
FEB_2004_mammals_1 is a merger of probes based on the selection criteria used in AUG_2003_b1 and OCT_2003_b2. The best OCT_2003_b2 probes in every 1 kb interval (220,283 unique and 172,946 non-unique) were merged with probes designed using human-mouse alignments (as in AUG_2003_b1, cutoff 88% identity, 258,392 unique, 171,411 non-unique, debugged version of sooper.xml) and filtered to produce a single non-redundant probe set. This merger of probes designed by the two approaches reduces the problems inherent to each approach. For example, while the OCT_2003_b2 approach uses human-mouse-rat alignments to select the probes, the current algorithm ignored alignments that did not include all three species. Thus, no probes were designed from regions with sequencing gaps in either the mouse assembly or rat assembly in OCT_2003_b2.
FEB_2004_mammals_1 probes were assigned two scores based on both approaches. In cases where a just a human-mouse, but not human-mouse-rat alignment were available, we estimated a human-mouse-rat score using the average score for all other probes with the same percent identity between human and mouse. OCT_2003_b2 scores are used for probe selection when the spacing option is invoked.
Since the criteria used to design the FEB_2004_mammals_1 probes is not different from the previous builds, the probe success rates can be estimated from prior experimental confirmation of the probes in AUG_2003_b1 and OCT_2003_b2.
|
Table 5. Probe Summary by Chromosome for FEB_2004_mammals_1.
|
|||
|
Human Chromosome |
Length w/o gaps (bp) |
Unique Probes |
Non-Unique Probes |
|
Chr1 |
218712898 |
33459 |
31280 |
|
Chr2 |
237043677 |
32217 |
27228 |
|
Chr3 |
193607233 |
25103 |
20638 |
|
Chr4 |
186580523 |
17177 |
14189 |
|
Chr5 |
177524972 |
21974 |
18616 |
|
Chr6 |
166880541 |
18090 |
16487 |
|
Chr7 |
154546299 |
17193 |
16174 |
|
Chr8 |
141694337 |
14406 |
12071 |
|
Chr9 |
115187719 |
15744 |
14049 |
|
Chr10 |
130710874 |
16897 |
14114 |
|
Chr11 |
130709420 |
20653 |
17942 |
|
Chr12 |
129328334 |
16723 |
14721 |
|
Chr13 |
95511656 |
8714 |
7403 |
|
Chr14 |
87191216 |
12958 |
11013 |
|
Chr15 |
81117055 |
13027 |
12683 |
|
Chr16 |
79890795 |
12633 |
11354 |
|
Chr17 |
77480855 |
16470 |
15428 |
|
Chr18 |
74534531 |
8377 |
6443 |
|
Chr19 |
55780860 |
7222 |
6741 |
|
Chr20 |
59424990 |
8738 |
7240 |
|
Chr21 |
33924747 |
2833 |
2496 |
|
Chr22 |
34352072 |
4802 |
4361 |
|
ChrX |
147686666 |
16545 |
16021 |
|
ChrY |
22761097 |
31 |
1106 |
|
|
|
|
|
|
Total |
2832199938 bp |
361986 |
319798 |
MAR_2004_birds/reptiles_1 is based on chicken-human alignments of UCSC chicken genome assembly galGal2 and UCSC human genome assembly hg16 downloaded from http://genome.ucsc.edu. Preliminary data suggested that probes based on chicken sequence with >88% chicken-human identity will be useful for screening bird and reptile libraries. We used the latest version of sooper.xml to generate the probes with a cutoff of 88% identity. Probes were designed from the chicken sequence and then classified as unique or non-unique by megablastcomparison to galGal2 using the criteria described above. Stats from the build are below, as is a summary of our experimental validation.
|
Table 6. Probe Summary by Chromosome for MAR_2004_birds/reptiles_1 |
|||
|
|
|
|
|
|
Chicken Chromosome |
Length w/o gaps |
Unique Probes |
Non-Unique Probes |
|
Chr1 |
183744490 |
9874 |
3416 |
|
Chr1_random |
1261352 |
91 |
52 |
|
Chr2 |
143798269 |
7739 |
2600 |
|
Chr2_random |
53846 |
8 |
42 |
|
Chr3 |
105892232 |
7085 |
1748 |
|
Chr3_random |
1565637 |
298 |
125 |
|
Chr4 |
87964617 |
5311 |
1714 |
|
Chr4_random |
1053861 |
60 |
10 |
|
Chr5 |
54038396 |
4382 |
1509 |
|
Chr5_random |
34329 |
1 |
2 |
|
Chr6 |
33398103 |
2834 |
885 |
|
Chr6_random |
3628 |
0 |
0 |
|
Chr7 |
35405183 |
3479 |
1375 |
|
Chr7_random |
2021 |
0 |
3 |
|
Chr8 |
28179243 |
2884 |
803 |
|
Chr8_random |
5240 |
0 |
0 |
|
Chr9 |
23054450 |
2135 |
697 |
|
Chr10 |
18954178 |
1989 |
1076 |
|
Chr10_random |
3515812 |
177 |
73 |
|
Chr11 |
17999990 |
1544 |
640 |
|
Chr11_random |
1104880 |
360 |
86 |
|
Chr12 |
19041590 |
1541 |
508 |
|
Chr13 |
16797080 |
1447 |
672 |
|
Chr13_random |
1106257 |
150 |
63 |
|
Chr14 |
20157496 |
1741 |
644 |
|
Chr15 |
12220990 |
1371 |
545 |
|
Chr16 |
190259 |
10 |
12 |
|
Chr16_random |
244471 |
8 |
22 |
|
Chr17 |
9893572 |
1192 |
506 |
|
Chr18 |
8797585 |
882 |
1300 |
|
Chr19 |
9317615 |
1620 |
451 |
|
Chr20 |
13295085 |
1302 |
515 |
|
Chr21 |
6044995 |
957 |
366 |
|
Chr22 |
2187313 |
197 |
110 |
|
Chr23 |
5032209 |
824 |
286 |
|
Chr24 |
5780194 |
914 |
245 |
|
Chr24_random |
95706 |
8 |
22 |
|
Chr26 |
3666719 |
650 |
283 |
|
Chr27 |
2501764 |
373 |
244 |
|
Chr27_random |
696249 |
226 |
132 |
|
Chr28 |
4040991 |
570 |
189 |
|
28_random |
5820 |
0 |
12 |
|
Chr32 |
990310 |
223 |
115 |
|
Chr32_random |
28993 |
0 |
0 |
|
ChrW |
4135691 |
458 |
124 |
|
ChrW_random |
229903 |
1 |
29 |
|
ChrZ |
30832492 |
1840 |
497 |
|
ChrZ_random |
14348615 |
850 |
326 |
|
ChrE22C19W28 |
47202 |
14 |
8 |
|
ChrE26C13 |
213526 |
67 |
18 |
|
ChrE50C23 |
10171 |
1 |
4 |
|
ChrE64 |
1525 |
0 |
0 |
|
Chr_Un |
121198700 |
4032 |
11093 |
|
|
|
|
|
|
Totals |
1054180845 |
73720 |
36197 |

Figure 5. Results of the experimental validation of a sample set of MAR_2004_birds/reptiles_1 universal probes.
To test the efficiency of the
bird/reptile whole-genome probe set, AUG_2003_b1, n=68 probes were selected
from n=8 regions of the chicken genome for screening the turkey (CHORI-260),
zebra finch (TG_Ba), emu
(VMRC-16), alligator (VMRC-8), and tuatara (VMRC-12) BAC libraries. After
primary and secondary screens, probe-content information was merged with
restriction-enzyme fingerprint content maps. Based on this information, the
success rate (the fraction of probes tested that were positive for at least one
BAC clone) in each species was calculated, and is shown above. Representative
clones have been sent to the NIH
Intramural Sequencing
Center for sequencing to
confirm probe specificity.
To fully utilize the increasing number of multiple species whole-genome alignments, a new probe design algorithm, nsoop, was developed. Briefly, nsoop will take as input multiple species alignments from N number of species, in which all, or just a subset of species, can be considered in the probe design process. A user defined phylogeny (newick format with branch lengths) is then used with maximum parsimony or maximum likelihood methods for ancestral sequence reconstruction at each node, and subsequent scoring is based on the sum of the branch lengths connecting nodes/tips with matching nucleotides. Instances where maximum parsimony does not resolve a position in the ancestral sequence to a single nucleotide are scored by taking the average score calculated using each possible nucleotide. Both the maximum parsimony and maximum likelihood methods result in very similar sets of selected probes. For example, in a sample of 4,458,109 probes designed from human-chimp-mouse-rat-chicken alignments, 97.41% of the probes were selected by both maximum parsimony and maximum likelihood scoring routines, and the correlation between the maximum parsimony and maximum likelihood scores was in excess of 0.998.
During the process of creating a second all mammals probe set, a error in the intended scoring logic was detected in the original version of nsoop. Specifically, scoring of the cumulative branch lengths began at the root of the tree and went down the tree scoring matches between nodes (inferred ancestral sequences) and the nodes and leaves/tips of the tree (observed sequences). Our intent was to go up from the sequence from which the probe is designed (leaf) and progress up the tree first as far as possible (ie, as long as the ancestral node sequences matched the probe sequence) and then score from that point down the tree. For example, if the nucleotide position was identical between all observed species, and thus likely all ancestral sequences, the root would be the starting point for scoring and result in the highest score possible for a single position. However, if the inferred ancestral sequence at the first node up from the observed sequence differ (representing say the most recent common ancestor of mouse and rat when making a probe from mouse sequence), then the lowest score possible would be assigned regardless of whether or not the position was conserved among any or all of the other species. Depending on the ancestral sequence reconstructions and pattern of mismatches among the species, this could have a significant impact on scoring the probes. In particular, in some cases probes would receive higher or lower scores then intended based on our intended scoring logic. To correct this error in these and future probe sets, we implemented the intended scoring logic in nsoop_v2. We apologize for any inconvience this might have caused and encourage users to contact us if they have further questions. The consequence of the scoring changes on the probe sets is outlined under APR_2005_rodents_1.1 and APR_2005_carnivores_1.1.
You can download and read more about nsoop_v2 here.
APR_2005_rodents_1 is a whole-genome probe set of universal probes specifically designed for screening rodent libraries. Mouse-rat-human-dog whole-genome alignments from http://genome.ucsc.edu were used to identify mouse sequences likely to be highly conserved with other rodents. A summary of the probe set is listed in Table 7.
An error found in our probe scoring logic was discovered in the original version of nsoop that required us to rebuild and replace the previously released OCT_2004_rodents_1 with the APR_2005_rodens_1.1 probe set. The probe sets overall are very similar with the fraction of probes that were in the previous OCT_2004_rodents_1 set that did not meet the score cutoff with the corrected scoring logic just over 17% (67,663 probes). In addition, 258,429 (65%) of the OCT_2004_rodents_1 probes are also present in the APR_2005_rodents_1.1 build. Overall, the corrected scoring scheme resulted in the retention of more probes in the APR_2005_rodents1.1 set and improved genome coverage. Please contact us if you have specific questions on this correction and we apologize for any problems this error might have caused.
A new algorithm, nsoop_v2, was used to select this probe set and assign probe scores. Briefly, the following phylogeny: (((mouse: 0.02870, rat: 0.04165): 0.09006, human: 0.04529): 0.01591, dog: 0.08444); was used to generate probe scores. The branch lengths were calculated with baseml in PAML first using an unrooted tree with just these four species from 29,431 bp alignment from the CFTR region on human chromosome 7. The distances for the two branches leading to dog in a rooted tree were then estimated using smaller alignments that also included either wallaby or opossum. Minimum probe scores for inclusion in the final probe set were determined using both the maximum likelihood scores and number of mismatches. The vast majority of the probes conform to the following criteria:
Aligned Criteria
Mouse-Rat 0 mismatches
Mouse-Dog < 3 mismatches
Mouse-Human < 3 mismatches
Mouse-Human-Dog < 3 mismatches for both mouse-human and mouse-dog
Mouse-Rat-Human < 2 mismatches mouse-rat, < 3 mismatches mouse-human
Mouse-Rat-Dog < 2 mismatches mouse-rat, < 3 mismatches mouse-dog
Mouse-Rat-Human-Dog < 2 mismatches mouse-rat, < 5 mismatches for both mouse-human and mouse-dog
These criteria are more conservative compared to the whole-genome mammalian probes. To test the probe success rate of the rodent whole-genome universal probe set,n = 48 probes were used to screen the deer mouse (CHORI-233) and 13-lined ground squirrel (VMRC-20) BAC libraries. After the primary and secondary screens, probe-content information was merged with restriction-enzyme fingerprint content maps. Based on this information, the success rate (the fraction of probes tested that were positive for at least one BAC clone) in each species was calculated as 46% for squirrel and 90% for deer mouse. Using a more stringent criteria of probe success defined as probes that identified at least two and fewer than 20 clones the success rates for squirrel and deer mouse were 31% and 83%, respectively. A subset of representative clones identified with these probes have been sequenced by the NISC to evaluate the combined specificity of the probes. In the case of squirrel, 9/9 sequenced clones mapped to the orthologous target regions, and in the case of deer mouse 21/21 clones mapped back to the targeted orthologous regions. A summary file of the test set of probes can be downloaded here, and a summary file of the mapped clones can be downloaded here.
|
Table 7. Probe Summary by chromosome for APR_2005_rodents_1.1 |
|||
|
Mouse Chromosome |
Length w/o gaps (bp) |
Unique Probes |
Non-Unique Probes |
|
Chr1 |
185739816 |
20142 |
13092 |
|
Chr2 |
178128968 |
24567 |
16073 |
|
Chr3 |
151641779 |
15205 |
10255 |
|
Chr4 |
150169032 |
18988 |
12806 |
|
Chr5 |
140185730 |
16161 |
9916 |
|
Chr6 |
140598523 |
15580 |
10211 |
|
Chr7 |
123686188 |
16363 |
11613 |
|
Chr8 |
120458717 |
14542 |
8786 |
|
Chr9 |
117228887 |
16732 |
9994 |
|
Chr10 |
122927168 |
12282 |
7852 |
|
Chr11 |
118398857 |
21156 |
12611 |
|
Chr12 |
108019676 |
13045 |
8244 |
|
Chr13 |
109349262 |
11520 |
7732 |
|
Chr14 |
110323967 |
13069 |
7919 |
|
Chr15 |
98419177 |
10824 |
7058 |
|
Chr16 |
92679592 |
9949 |
6294 |
|
Chr17 |
86658738 |
9736 |
6819 |
|
Chr18 |
86685738 |
10174 |
6332 |
|
Chr19 |
56490660 |
8749 |
5601 |
|
ChrX |
155777425 |
12501 |
10669 |
|
ChrY |
37314788 |
19 |
728 |
|
Chr1_random |
1774182 |
0 |
170 |
|
Chr2_random |
7873155 |
60 |
1226 |
|
Chr3_random |
2438032 |
82 |
397 |
|
Chr4_random |
8311544 |
29 |
1323 |
|
Chr5_random |
2808176 |
19 |
199 |
|
Chr6_random |
1859321 |
5 |
142 |
|
Chr7_random |
6635828 |
496 |
758 |
|
Chr8_random |
1478105 |
5 |
157 |
|
Chr9_random |
837411 |
0 |
90 |
|
Chr10_random |
722152 |
0 |
54 |
|
Chr12_random |
1861124 |
35 |
244 |
|
Chr13_random |
1725890 |
0 |
185 |
|
Chr14_random |
2063169 |
14 |
191 |
|
Chr15_random |
765702 |
2 |
82 |
|
Chr16_random |
1152388 |
5 |
186 |
|
Chr17_random |
1130230 |
4 |
210 |
|
Chr18_random |
1555732 |
1 |
96 |
|
Chr19_random |
597925 |
0 |
138 |
|
ChrX_random |
9460804 |
11 |
906 |
|
ChrY_random |
503236 |
0 |
32 |
|
ChrUn_random |
69030694 |
1492 |
5040 |
|
|
|
|
|
|
Total |
2615467488 |
293564 |
202431 |
APR_2005_carnivores_1.1 is a whole-genome probe set of universal probes specifically designed for screening carnivore libraries. Dog-human-mouse-rat whole-genome alignments from http://genome.ucsc.edu were used to identify dog sequences likely to be highly conserved with other carnivores with nsoop_v2. A combination of mismatches and nsoop probe scores were used determine inclusion of probes in this universal probe set. The starting mismatch criteria was < 3 dog-human mismatches and < 10 dog-rodent mismatches. A summary of the resulting probes is listed below. A chromosome summary of the probe set is listed in Table 8.
This probe set replaces JAN_2005_carnivores_1 that was created using nsoop, which was subsequently found to have an error in probe scoring logic. Comparison of the two probe sets indicates that 4% of the probes in JAN_2005_carnivores_1 (15,705) did not meet the criteria for inclusion in the APR_2005_carnivores_1.1 build and that 357,860 of the probes are common to both probe sets (81% of JAN_2005_carnivores_1 probes).The corrected scoring resulted in higher genome coverage for the new carnivore set. We apologize for any problems this error may have caused and we would be happy to answer further questions from the public on this error and correction.
Dog-Human Alignments (probe scores 4.86-5.24)
100% (50,360/50,360) of the probes in this probe score range met the mismatch criteria.
Dog-Human-Mouse Alignments (probe scores 8.83-9.52)
90.7% (9058/9992) of probes in this probe score range met the mismatch criteria.
50% (9058/18142) of probes that met mismatch criteria included.
Dog-Human-Rat Alignments (probe scores 9.29-9.98)
90.7% of probes in this range met the mismatch criteria (1354/1493)
29.6% (1354/4582) of probes that met mismatch included
Dog-Human-Mouse-Rat (probe scores 9.66-11.02)
75.1% (404306/538489) of probes in this range met the mismatch criteria
91.8% (404306/440330) of all probes that met the mismatch criteria were included.
By far largest group of probes.
|
Table 8. Probe Summary by chromosome for APR_2005_carnivores_1.1 |
|||
|
Dog Chromosome |
Length w/o gaps (bp) |
Unique Probes |
Non-Unique Probes |
|
Chr1 |
120715446 |
17978 |
8238 |
|
Chr2 |
83844569 |
14889 |
6844 |
|
Chr3 |
91113804 |
13826 |
5611 |
|
Chr4 |
87862066 |
14640 |
6325 |
|
Chr5 |
88298129 |
20060 |
8536 |
|
Chr6 |
75429024 |
14434 |
6073 |
|
Chr7 |
79535956 |
14488 |
6165 |
|
Chr8 |
73732664 |
13538 |
6074 |
|
Chr9 |
50162806 |
14464 |
6553 |
|
Chr10 |
69071640 |
12602 |
5323 |
|
Chr11 |
72007474 |
13180 |
5693 |
|
Chr12 |
72134750 |
11727 |
5154 |
|
Chr13 |
62439878 |
8506 |
3563 |
|
Chr14 |
60250532 |
10308 |
4233 |
|
Chr15 |
63623497 |
10550 |
4718 |
|
Chr16 |
56674272 |
7122 |
3269 |
|
Chr17 |
63455904 |
11315 |
5180 |
|
Chr18 |
62421077 |
11536 |
5336 |
|
Chr19 |
53494395 |
7157 |
2880 |
|
Chr20 |
57438949 |
12594 |
5955 |
|
Chr21 |
49606594 |
7523 |
3512 |
|
Chr22 |
60967026 |
7711 |
3064 |
|
Chr23 |
52323076 |
7710 |
3356 |
|
Chr24 |
47284650 |
9029 |
3947 |
|
Chr25 |
51065444 |
6916 |
3179 |
|
Chr26 |
37554750 |
6132 |
2881 |
|
Chr27 |
45599871 |
7758 |
4304 |
|
Chr28 |
39232621 |
8776 |
3509 |
|
Chr29 |
41639609 |
5615 |
2184 |
|
Chr30 |
39917767 |
8993 |
3802 |
|
Chr31 |
37867089 |
4535 |
1758 |
|
Chr32 |
38726899 |
4894 |
2408 |
|
Chr33 |
31261292 |
4866 |
2054 |
|
Chr34 |
41783138 |
5951 |
2379 |
|
Chr35 |
26292642 |
3724 |
1859 |
|
Chr36 |
30762478 |
6096 |
2311 |
|
Chr37 |
30686873 |
6292 |
2666 |
|
Chr38 |
23298972 |
4133 |
1720 |
|
ChrX |
121210679 |
18723 |
9031 |
|
ChrUn_random |
69040064 |
1382 |
1158 |
|
|
|
|
|
|
Totals |
2359828366 |
391673 |
172805 |
Experimental validation of this probe set was performed by screening a clouded leopard BAC library (CHORI-87). The success rate of the carnivore universal probes for the clouded leopard BAC library was 81% (i.e., 39/48 probes tested identified at least one leopard BAC clone). Representative BAC clones are currently being selected for sequencing to evaluate the specificity of this probe set. Using a more stringent criteria of probe success defined as probes that identified at least two and fewer than 20 clones the success rate for clouded leopard was 73%. A subset of representative clones identified with these probes have been sequenced by the NISC to evaluate the combined specificity of the probes. 17/18 sequenced clouded leopard clones mapped to the orthologous target regions. A summary file of the test set of probes can be downloaded here, and a summary file of the mapped clones can be downloaded here.
We have enhanced the search capabilities on the birds/reptiles, rodents and carnivores probe sets to better capture all of the information about gene names, products etc that are included in xenomRNA and xenoRefSeq tracks.
The tree depicting available BAC libraries has also been updated to include new libraries and correct topology mistakes included in the earlier version of the tree.
April 2005 Updates
The OCT_rodents_1 and JAN_carnivores_1 probe sets were replaced with APR_2005_rodents_1.1 and APR_2005_carnivores_1.1. A corrected version of nsoop, nsoop_v2, was used to make these probe sets. Users are welcomed to contact us with questions as to more details on this change.
An updated whole-genome universal probe set for screening mammalian libraries, JUN_2005_mammals_2, was created by merging the FEB_2004_mammals_1 probe set with new probes designed from human-mouse-rat-dog-chicken whole genome alignments to hg17 (NCBI build 35) using nsoop_v2 and the input phylogeny((((mouse: 0.02589, rat: 0.02999): 0.07201, human: 0.03563): 0.01316, dog: 0.05622): 0.10466, chicken: 0.14417);. The score-cutoffs for inclusion of newly designed probes in this mammalian probe release were set such that at least 75% of the new probes would have met the previous criteria set for mammals_1 and the mismatch criteria outlined below. As a consequence of merging sets of probes designed at different times with different criteria, a substantial fraction of the probes that met the score and mismatch criteria overlapped. To eliminate unnecessary redundancy in the final probe set, in instances where probes overlapped by more than 18-bp, the best single probe was selected for retention, and the other(s) probes discarded. Unique probes that overlapped with non-unique probes by 30 or more bases were also discarded. In total, this probe set increased the genome coverage compared to the previous mammalian whole-genome probe set by ~6%, and is expected to have equivalent or enhanced success rates.
Mismatch criteria for JUN_2005_mammals_2:
Human-Dog: 2 or fewer mismatches
(Probe scores 3.61-3.78. 100% (47697/47697) of the probes met the mismatch criteria)
Human-dog-chicken: 3 or fewer human-dog mismatches
(Probe scores 12.33-12.74. 100% (56/56) of the probes met the mismatch criteria)
Human-mouse: 0 mismatches
(Probe scores 4.81. 100% (153/153) of the probes met the mismatch criteria)
Human-mouse-chicken: 2 or fewer human-mouse mismatches
(Probe scores 13.56-14.24. 100% (8/8) of the probes met the mismatch criteria)
Human-mouse-dog: 4 or fewer human-mouse and 3 or fewer human-dog mismatches
(Probe scores 6.68-7.31. 75% (5220/6905) of the probes met the mismatch criteria)
Human-mouse-dog-chicken: 4 or fewer human-mouse and 3 or fewer human-dog
(Probe scores 14.25-16.26. 75% (3074/4096) of the probes met the mismatch criteria)
Human-mouse-rat: 2 or fewer mismatches in human-mouse and human-rat
(Probe scores 5.83-5.89. 100% (529/529) of the probes met the mismatch criteria)
Human-mouse-rat-chicken: 4 or fewer human-mouse and human-rat mismatches
(Probe scores 13.16-15.32. 100% (769/1015) of the probes met the mismatch criteria)
Human-mouse-rat-dog: less than or equal to 3 human-dog mismatches and either 4 or fewer mismatches in human-mouse or human-rat
(Probe scores 7.69-8.38. 95% (148,825/156,150) of the probes met the mismatch
Human-mouse-rat-dog-chicken: less than 3 human-dog and either 4 or fewer mismatches in human-mouse or human-rat
(Probe scores 15.32-17.34. 84% (139,490/165,822) of the probes met the mismatch
Human-rat: 0 mismatches
(Probe scores 4.96. 100% (19/19) of the probes met the mismatch criteria)
Human-rat-chicken: 2 or fewer human-rat mismatches
(Probe scores 13.01-14.39. 100% (10/10) of the probes met the mismatch criteria)
Human-rat-dog: 4 or fewer human-rat and 3 or fewer human-dog mismatches
(Probe scores 7.24-7.45. 100% (228/228) of the probes met the mismatch criteria)
Human-rat-dog-chicken: 4 or fewer human-rat and 3 or fewer human-dog mismatches
(Probe scores 14.69-16.41. 75% (363/484) of the probes met the mismatch criteria)
|
Table 9. Probe Summary by chromosome for JUN_2005_mammals_2 |
|||
|
Human Chromosome |
Length w/o gaps (bp) |
Unique Probes |
Non-Unique Probes |
|
Chr1 |
222827847 |
35717 |
29423 |
|
Chr2 |
237506229 |
34251 |
26304 |
|
Chr3 |
194635740 |
27264 |
20352 |
|
Chr4 |
187161218 |
18759 |
13847 |
|
Chr5 |
177702766 |
23171 |
17784 |
|
Chr6 |
167317699 |
20553 |
15559 |
|
Chr7 |
154759139 |
18133 |
15626 |
|
Chr8 |
142612826 |
15364 |
11751 |
|
Chr9 |
117781268 |
16995 |
13381 |
|
Chr10 |
131613628 |
17947 |
13579 |
|
Chr11 |
131130853 |
21053 |
16826 |
|
Chr12 |
130259811 |
18143 |
14427 |
|
Chr13 |
95559980 |
9565 |
7358 |
|
Chr14 |
88290585 |
13671 |
10419 |
|
Chr15 |
81341915 |
13829 |
12171 |
|
Chr16 |
78884754 |
13030 |
10579 |
|
Chr17 |
77800220 |
16486 |
14376 |
|
Chr18 |
74656155 |
9272 |
6327 |
|
Chr19 |
55785651 |
7365 |
6337 |
|
Chr20 |
59505253 |
9287 |
6811 |
|
Chr21 |
34171998 |
2956 |
2455 |
|
Chr22 |
34764810 |
4848 |
4495 |
|
ChrX |
150394264 |
18107 |
16158 |
|
ChrY |
24871691 |
29 |
1195 |
|
|
|
|
|
|
Total |
2851336300 |
385795 |
307540 |
A whole-genome probe set was designed based on pairwise whole-genome alignments between hg17 (NCBI build 35) and opossum (monDom1) using soop_v2. Preliminary tests of probes designed with these comparisons indicated that 36-mers with four or fewer mismatches between human and opossum had greater than a 50% success rate screening a wallaby library. Thus, this probe set consists of 36-mers with four or fewer mismatches between human and opossum. This criteria resulted in the development of 121,772 unique and 83,546 non-unique probes for screening marsupial libraries. Estimated genome coverage for this probe set is 3-fold lower than for JUN_2005_mammals_2, but the probe success rate is expected to be greater than 50% in marsupials.
Experimental validation of this probe set was performed by screening a wallaby BAC library. The success rate of the marspuial universal probes for the wallaby BAC library (ME_KBa) was 81% (i.e., 39/48 probes tested identified at least one wallaby BAC clone). Using a more stringent criteria of probe success defined as probes that identified at least two and fewer than 20 clones the success rate for wallaby was 75%. A subset of representative clones identified with these probes have been sequenced by the NISC to evaluate the combined specificity of the probes. 8/9 clones mapped back to the targeted orthologous regions. A summary file of the test set of probes can be downloaded here, and a summary file of the mapped clones can be downloaded here.
Development of a new cross-species query option
Because the opossum assembly is highly fragmented and not anchored by chromosomes, we have developed a new option to facilitate the identification of probes of interest using cross-species queries. Specifically, users can now designate a 'query' genome to search for probes distinct from the genome from which the probes were designed ('reference' genome). For example, it is now possible to search by human chromosome location, gene name, accession number etc to retrieve probes designed from the inferred syntenic/orthologous locations in the opossum assembly. This cross-species query option is available for every species that contributed to the design of a given probe set. This option can be accessed here.
Development of a batch-query option
To facilitate the retrieval of probes from multiple locations in the genome, we have developed a batch-query interface. Users can now enter multiple search strings (either directly or by a uploading a file) into the newly developed interface and receive the results in a text file via email. This option can be accessed here.
Development of an on demand universal probe design for Apes and Old world monkeys.
A whole-genome human-chimpanzee-rhesus monkey alignment of hg18-panTro1-rheMac2 was downloaded from the UCSC Browser and serves as the basis for the on demand design of probes for screening Ape and Old world monkey BAC libraries. Optimal universal probe design parameters were determined in a stepwise process:
i. design all possible probes from the human chromosome 7 hg18-panTro1-rheMac2 alignment file,
ii. evaluate the probe score distributions correlated with number of human-chimpanzee and human-rhesus monkey mismatches to set the minimum probe score cut-off values,
iii. determine the frequency of universal probes with scores greater than the above cut-off scores in 51 250-kb intervals that accurately represent the genome-wide distribution of divergence between human and chimpanzee.
The resulting probes are expected to have >90% probe success rate for all species within this clade and to be present at high densities across the entire genome. Specifically, with one exception on the Y chromosome, at least 84 ‘unique’ universal probes were identified in each of the 250 kb intervals sampled above.
To make universal probes on demand, users first identify and verify the region/regions of the genome of interest via a simple database query. Once identified, the server initiates our established probe design process targeted to the requested segment(s) of the genome and the probes are returned to the user in variety of formats via email. The on demand universal probe design for Apes and Old World Monkeys can be accessed here.
The predetermined fixed default values for the on demand universal probe design for Apes and Old world monkeys is as follows:
Human-chimpanzee- None
Human-chimpanzee-rhesus monkey 91% of the probes have 0 human-chimpanzee mismatches and 1 or fewer human-rhesus monkey mismatches (probe score > 2.030)
Human—rhesus monkey 100% of probes have 1 or fewer human-rhesus monkey mismatches (probe score > 1.856)
Experimental validation of the default parameters.

Figure 6. Results of experimental validation for the Apes & OWM on demand universal probes. To test the probe success rate of universal probe design process for Apes & OWM, n = 48 probes were used to screen the Japanese macaque (CHORI-270), baboon (RPCI-41), colobus monkey (CHORI-272) and gibbon (CHORI-271) BAC libraries. After the primary and secondary screens, probe-content information was merged with restriction-enzyme fingerprint content maps. Based on this information, the success rate (the fraction of probes tested that were positive for at least one BAC clone) in each species was calculated. Using a more stringent criteria of probe success defined as probes that identified at least two and fewer than 20 clones the success rates for Japanese macaque, baboon, colobus monkey and gibbon were: 94%, 83%, 83% and 94%. A subset of representative clones identified with these probes have been sequenced by the NISC to evaluate the combined specificity of the probes. In the case of Japanese macaque, 7/7 sequenced clones mapped to the orthologous target regions, 8/8 sequenced baboon clones mapped back to the targeted orthologous regions, 8/9 sequenced colobus monkey clones mapped to the orthologous target regions, and 10/10 sequenced gibbon clones mapped back to the targeted orthologous regions. A summary file of the test set of probes can be downloaded here, and a summary file of the mapped clones can be downloaded here.
Development of an on demand universal probe design for New world monkeys (October 2008).
A whole-genome human-chimpanzee-orangutan-rhesus-marmoset alignment of hg18-panTro2-panAbe2-rheMac2-calJac1 was downloaded from the UCSC Browser and serves as the basis for the on demand design of probes for screening New world monkey BAC libraries. Optimal universal probe design parameters were determined in a stepwise process:
i. design all possible probes from the regions of the marmoset genome assembly orthologous to human chromosome 7 using the hg18-panTro2-panAbe2-rheMac2-calJac1 alignment file,
ii. evaluate the probe score distributions correlated with number of mismatches to set the minimum probe score cut-off values,
iii. determine the frequency of universal probes with scores greater than the above cut-off scores in 51 250-kb intervals that accurately represent the genome-wide distribution of sequence divergence.
The resulting probes are expected to have >90% probe success rate for all species within this clade and to be present at high densities across the entire genome. Specifically, excluding an interval on the Y chromosome and an interval orthologous to Chr15 which is all segmental duplications, an average of 51 universal probes were identified in each of the 250 kb intervals sampled above (range 7-144 probes/250 kb).
To make universal probes on demand, users first identify and verify the region/regions of the genome of interest via a simple database query. Once identified, the server initiates our established probe design process targeted to the requested segment(s) of the genome and the probes are returned to the user in variety of formats via email. The on demand universal probe design for New World Monkeys can be accessed here.
The predetermined fixed default values for the on demand universal probe design for New world monkeys were set such that nearly all probes will have 2 or fewer mismatches between marmoset (reference New world monkey sequence for the designed probes) and each of the other primates used in the comparison (human, chimp, orangutan and rhesus).
A test set of n = 48 probes were hybridized to a pair of BAC libraries from representative New World Monkeys: Dusky Titi (LBNL-5) and an Owl monkey (CHORI-258). The probe success rates were 87% for the Duskty Titi and 89% for the Owl monkey.
A whole-genome human-chimpanzee-orangutan-rhesus-marmoset-tarsier-mouse lemur-galago alignment of hg18-panTro2-panAbe2-rheMac2-calJac1-tarSyr1-micMur1-otoGar1 was downloaded from the UCSC Browser and serves as the basis for the on demand design of probes for screening Simian and All Primate BAC libraries. Optimal universal probe design parameters were determined the stepwise process described above for the Apes&OWM and NWM probe sets.
The resulting probes for screening Simian libraries are expected to have >85% probe success rate for all species within this clade and to be present at high densities across the entire genome comparable to the density described for the NWM probe design criteria. The All Primates probe design criteria is expected to have >80% probe success rate for all primate genomic libraries.
As with the previous primate probe design tools, to make universal probes on demand, users first identify and verify the region/regions of the genome of interest via a simple database query. Once identified, the server initiates our established probe design process targeted to the requested segment(s) of the genome and the probes are returned to the user in variety of formats via email. The on demand universal probe design for Simians can be accessed here and the All Primates here.
Simian Probe Validation. A test set of n = 48 probes were hybridized to a set of three BAC libraries from representative Simians: gibbon (CHORI-271), Colobus monkey (CHORI-272) and Dusky titi (LBNL-5). The probe success rates were 92% for the gibbon, 88% for the Colobus monkey, and 85% for the Dusky titi.
All Primates Validation. A test set of n = 32 probes were hybridized to a set of three BAC libraries from representative nonhuman primates: Japanese macaque (CHORI-270), Owl monkey (CHORI-258), and Black lemur (CHORI-273). The probe success rates were 100% for the Japanese macaque, 100% for the Owl monkey, and 91% for the Black lemur.