gEVAL Browser Frequently Asked Questions
There are many different colours on my end/read track. Can you explain them?
You can find the legend for the colours here.
How often are gEVAL/PGP builds?
gEVAL/PGP Builds vary for each organism, however the GRC organisms are on a more frequent refresh cycle. For example, zebrafish builds are on a bi-monthly schedule.
What is the difference between the Human/Mouse Monthly gEVAL/PGP Builds and the Human/Mouse AGP?
- gEVAL/PGP builds include unfinished sequence(phase 1 and 2 sequence) in the assembly, whereas the AGP viewers will substitute this with a gap place holder (certain assemblies include phase 2 but not phase 1).
- gEVAL/PGP builds are more up to date because of the inclusion of unfinished sequences compared to AGP Viewers.
What are WGS assemblies?
WGS assemblies represent full genome assemeblies using the whole genome shotgun approach. There are viewers for these assemblies for mouse and zebrafish. As well the data has been aligned to their respective gEVAL/PGP builds. Currently for GRC organisms, the wgs assemblies are primarily used to improve sequence content in regions not covered by clones but of course it is also useful for investigating potential sequence rearrangment and variation.
Who provides the assemblies?
- Human, Mouse, Zebrafish - Genome Reference Consortium
- Pig - Sanger/Pig Sequencing Consortium
- Helminth - Sanger/Helminth Sequencing Group
- Rat - Rat Genome Sequencing Consortium
What is gEVAL browser helping me with, what does it NOT do?
The gEVAL Browser is a frequently updated browser whose target audience is primarily directed to those who work on creating and improving assemblies. Commons examples of usage is to see what areas are, or should be targeted for sequencing, possible haplotypic variation and selection of new clones to sequence for a specific region. The gEVAL viewer while usually having a more updated path than Ensembl, is not a browser with a gene-centric focus. The fact there are cdna/transcript mappings, is more for using the model to compare neighboring clones its mapped on to.
How can I find out whether a certain gap can be closed?
Heres a check list to follow to see if 2 clones separated by a gap can be closed or not:
- turn on the selfcomp track and see if there are reciprical alignments available for the two clones. If so and no overlap was generated then possible reasons are very high variation in the overlap, or one completely overlaps another creating what would be a false 5'/5' or 3'/3' join with the clone and the clone on the other end. If you do a alignment between the 2 clones and overlap is relatively acceptable, please let us know, and we either fix or let the appropriate people fix the issue.
- turn on the end mapping track(s). First thing you want to do is look for spanning ends. These are the sequenced paired ends from the clone it came from. For example, a 123F01.T7 on clone A and 123F01.SP6 on clone B with correct orientation pointing towards the expected insert, can be chosen as a spanning clone. This clone can then be sequenced to fill that gap.
- If there are no spanning ends, it could be possible that the gap is larger than a single clone, and some sort of walking out the edge may be required. An option is look for end hangers, these are ends that are on the edge of a clone to the gap with an orientation pointing towards the gap. Like a primer walk this clone can be sequenced added to the path, and the gap shortened.
- Another option, if available is to use the wgs assemblies and see if any large wgs contigs cover a gap, if so no only will you possibly get an estimate of the gap size, but you can use the contig in combination with the clones on the edge to confidently choose hangers and spanners.
- ** As always, regions of interest may be quite repetitve, and such a chosen hanger or spanner, may not be unique to be used. It is always useful to check the region with selfcomp, and if the end is not green colour, to see how repetitive the region the end hits is.
How can I find out whether an assembly component is placed correctly?
This is to say, how can I be sure that the clone or the region I am interested is where it supposed to be. Heres a couple strategies to confirm if this is the case or not.
- markers, although a genetic and physical map is a major resource for mapping work, as projects progress, more maps are generated and added, helping increase the resolution of the marker coverage across the genome. More contigs and clones can then be pinpointed. In the browser, the marker track can be turned on and the markers are colour coded. Magenta markers indicate the markers are where it should be on the chromosome, blue coloured markers indicate that it should not be on the current chromosome/location. See for more info.
- blue coloured end mappings, like markers, some ends based on fpc data, have a location associated with it. This too can be used like the markers, indicating whether or not the location is correct. See for more here.
How does the gEVAL browswer compare to Ensembl?
The gEVAL browser will have builds of the tile path that is more frequent than that seen in Ensembl. This is because Ensembl will take a freeze of the path at one date and spend a bit of time preparing and completing numerous analysis that makes up its build. Because of this, Ensembl will have more analyses, specificaly gene-ctric ones. The gEVAL Browser is not fully intended to provide any analysis on genes or related data, it is more focused on sequence improvement of the available genomes. However having said that, if a gene of interest is in an poorly represented area in Ensembl, one can use the gEVAL browser to re-analyze the area, as a) the path will be more up to date containing finished and newly sequenced unfiniished clone sequence and b) the data and analysis will focus on giving an impression of the relative health of the region of interest. Navigation-wise it will be very similar to Ensembl as it uses a lot of its web framework. One addition in gEVAL browser is the use of Punchlists, specifically tailored list(s) that can be used to methodically resolve issues.
What is the GRC?
The GRC or Genome Reference Consortium is a group of educational institutes which was formed to improve the representation of reference genomes. At the time the human reference was initially described, it was clear that some regions were recalcitrant to closure with existing technology. The main reason for improving the reference assemblies are that they are the cornerstones upon which all whole genome studies are based (i.e. the 1000 Genomes Project).
The GRC currently maintains the reference genomes of human, mouse and zebrafish.
The members of the GRC are:
- The Wellcome Trust Sanger Institute
- The Genome Center at Washington University
- The European Bioinformatics Institute
- The National Center for Biotechnology Information
What are the end libraries used?
Human
| library | external prefix | internal prefix | library size distribution | range used in gEVAL |
| Chori-17 | CH17 | bMO | link | 146689-269814 |
| Whitehead fosmid | WI2 | fW | nolink | 31067-48600 |
| RPCI-11 | RP11 | bA | link | 113124-228383 |
| Chori-507 | CH507 | bPA | nolink | default |
| ABC7 | ABC7 | f7A | nolink | 25492-49289 |
| ABC8 | ABC8 | f8A | nolink | 26065-46749 |
| ABC9 | ABC9 | f9A | nolink | 31855-47051 |
| ABC10 | ABC10 | f10A | nolink | 34984-46978 |
| ABC11 | ABC11 | f11A | nolink | 34406-45615 |
| ABC12 | ABC12 | f12A | nolink | 34762-44676 |
| ABC13 | ABC13 | f13A | nolink | 33534-44996 |
| ABC14 | ABC14 | f14A | nolink | 33137-46638 |
| ABC16 | ABC16 | f16A | nolink | 32107-43947 |
| ABC24 | ABC24 | f24A | nolink | 33193-44338 |
| ABC27 | ABC27 | f27A | nolink | 33265-44107 |
Mouse
| library | external prefix | internal prefix | library size distribution | range used in gEVAL |
| RPCI-24 | RP24 | bN | L1L2 | 91404-242955 |
| RPCI-23 | RP23 | bM | link | 122253-270705 |
| Whitehead mouse fosmid | WI1 | fM | nolink | 28450-50967 |
| Chori-25 | CH25 | bMS | link | 43595-162830 |
| Chori-36 | CH36 | bML | link | 113292-247285 |
Zebrafish
| library | external prefix | internal prefix | library size distribution | range used in gEVAL |
| CHORI-211 | CH211 | zC | link | 97784-232694 |
| DanioKey | DKEY | zK | nolink | 87399-268251 |
| DanioKey Pilot | DKEYP | zKp | nolink | 121938-239280 |
| CHORI-73 | CH73 | zH | link | 67324-152296 |
| RPCI-71 | RP71 | bZ | link | 90017-205409 |
| ZFISHFOS | ZFISHFOS | zF | nolink | 32722-49432 |
| CHORI-1073 (FOS) | CH1073 | zFD | nolink | 31911-45771 |
Pig
| library | external prefix | internal prefix |
| PigE | PigE | bT |
| CHORI-242 | CH242 | bE |
| SBAB bI Clones | PigI | bI |
| WTSI_1005 | WTSI_1005 | fSS |
What is the current status of the tiling path?
Depending on organism, gEVAL still takes a freeze of the tiling path for each build roughly on a 2-4 month cycle. Clones that may be seen as being sequenced, or unfinished, may have progressed if you were to look at it today. To see the current status of all the clones and the path itself, the Sanger Institute made tool Chromoview can be used for Human, Mouse, Zebrafish and Pig. It is updated twice weekly and will display clone paths per chromosome for each organism, the overlaps, the AGPs and links to EMBL records and to the gEVAL.