QuEST

QuEST is a statistical software for analysis of ChIP-Seq data with data and analysis results visualization through UCSC Genome Browser.

QuESTions:

Post all your questions to QuEST ChIP-seq group. I do my best to respond to the questions.

http://groups.google.com/group/chipseq

QuEST HOWTOs:

1. How to run QuEST on some test data (scroll down this page)

2. How to perform de novo motif search on the ChIP-seq experiment analyzed by QuEST

3. How to perform Gene Ontologies (GO) analysis on the ChIP-Seq data obtained from the NCBI Short Read Archive (SRA)

check for more later...

Contact:

Anton Valouev: valouev@usc.edu

Distributions:

QuEST is distributed as an open source that needs to be compiled before running.

- current stable release QuEST 2.4 [ download source ]

- early release QuEST 1.0 [ download source ]

QuEST has been extensively tested and runs on current distributions of Red Hat Linux, and Mac OS.

Test data:

- GABP human ChIP-Seq data, (GA binding protein, Jurkat cells, hg18) [ download ]

- REST/NRSF human polyclonal ChIP-Seq data (Neural-restrictive silencing factor/RE1-silencing transcription actor, Jurkat cells, hg18) [ download ]

- REST/NRSFT human monoclonal ChIP-Seq data (Neural-restrictive silencing factor/RE1-silencing transcription factor, Jurkat cells, hg18) [ download ]

- SRF human ChIP-Seq data (Serum response factor, Jurkat cells, hg18) [ download ]

- RX_noIP data (aka control, input, total chromatin, sheared chromatin, Jurkat cells, hg18) [ download ]

- Human hg18 genome table [ download ]

Installation:

QuEST can be installed on any unix-based system that has make, perl and g++. On a Mac OS install Developer Tools packs to be able to compile QuEST.

1. Download the current release here.

2. Unpack QuEST:

- open a shell or a terminal

- cd into QuEST download directory

- type "tar -zxf QuEST_2.1.tar.gz"

- type "cd QuEST_2.1"

3. Configure for your platform

- type "./configure.pl" or "perl configure.pl"

4. Compile QuEST

- type "make"

If you are getting any error messages here you are probably missing g++ or some files cannot compile for other reasons. You can email me (value@stanford.edu) if you have a difficulty here.

Running QuEST on test data

1. Download GABP ChIP-Seq data from here, Jurkat RX_noIP data from here, and human hg18 genome table from here.

2. Unpack ChIP, control data and genome table

- type "gzip -d *.gz"

3. Check available disc space. You should have at least 30 Gb available for storage of temporary files

- type "df -m ./". The number should exceed 30 000 (see red arrow)

4. Configure QuEST analysis

- type "~/QuEST_distributions/QuEST_2.1/generate_QuEST_parameters.pl -solexa_align_ChIP ./GABP.align_25.hg18 -solexa_align_RX_noIP ./Jurkat_RX_noIP.align_25.hg18 -gt ./genome_table -ap ./QuEST_analysis -ChIP_name GABP_Jurkat". Replace path to QuEST distribution if necessary

You should now see something like this:


- QuEST will be running with visible progress and after some time you will see the following:

- type "y" and hit enter

- QuEST will run for a while and then you will see the following:

- type "1" and press enter

- You will be presented with the following question:

- type "2" and hit enter

- You will see the following screen:

- type "y" and press enter.

QuEST will now be running for some time. On and 8-core 2.4 Ghz linux server it takes about 25 minutes. You can monitor the progress by typing "top" in another terminal.

Understanding QuEST results

QuEST output consists of summary file, peak call files and UCSC genome browser track files that one can use to visualize her data

1. To see the summary of the analysis type "more ./QuEST_analysis/module_outputs/QuEST.out"

You can see the number of regions that are enriched (7170), peaks found within these regions (7953), peaks and regions that passed the quality metric criteria (7771 and 7037) and and FDR (1.08% and 1.2%)

2. To view individual peak and region summaries, type "more ./QuEST_analysis/calls/peak_caller.ChIP.out.accepted"

Q

QuEST outputs entries by region (lines starting with "R-<region number>" followed by peaks within them (lines starting with "L-<region number>-<peak number>"). Regions are separated by empty line. Each line contains different statistics about peak and region of interest.

Region fields:

1. R-<region number> (e.g. R-2 )

2. chromosome (e.g. chr11)

3. region_begin-region_end (e.g 47556767-47557549)

4. ChIP:

5. maximum СhIP unnormalized score within this region (e.g. 54.5)

6. control:

7. control unnormalized score at the position within the region with the highest ChIP score(e.g. 0.29)

8. max_pos:

9. position within the region with the maximum score (e.g.47557145)

10. ef:

11. Normalized enrichment fold at the maximum position within the region based on the score ratio (e.g. 309.574)

12. ChIP_tags:

13. Number of sequence reads in the ChIP data that fell within this region (e.g. 7071))

14. background tags:

15. Number of sequence tags from the background data that fell within this region ( e.g. 69)

16. tag_ef:

17. Normalized enrichment fold based on the tag counts within the region (e.g. 112.346)

18. ps:

19. Peak shift metric within the region. Expected to be about half the library fragment size (e.g. 61 bp)

20. cor:

21. Correlation between density profiles on + and - strand at the distance given by the region peak shift (e.g. 0.385673)

22. -log10_qv

23. Negative log base 10 of q-value obtained by Bonferroni correction of tag enrichment p-value (e.g. 3.90558e+06)

24. qv_rank:

25: Rank of q-value of this region compared to other regions (e.g. 3)

Peak fields:

1. P-<region_number>-<peak number within the region> (e.g. P-2-1)

2. chromosome (e.g. chr11)

3. peak position (e.g. 47557145)

4. ChIP:

5. Un-normalized ChIP score at the position of the peak (e.g. 54.5)

6. control:

7. Unnormalized background score at the position of the ChIP peak (e.g. 0.19)

8. region:

9. <containing region coordinate begin>-<containing region coordinate end> (e.g. 47556767-47557549)

10. ef:

11. normalized enrichment fold calculated from ration of ChIP and control scores (309.574)

12. ps:

13. peak shift metric for the peak. Expected to be about half the library size (e.g. 52 bps)

14. cor:

15. Correlation between opposite strand enrichment profiles at the distance given by the peak shift above (e.g. 0.981378 )

16. -log10_qv:

17. Negative log base 10 of the peak q-value obtained by Bonferroni correction of peak score p-value (e.g. 3.64579e+06)

18. qv_rank:

19. q-value based rank of the peak (e.g. 2)

Visualizing ChIP-Seq data with UCSC Genome Browser

ChIP-Seq data and QuEST analysis can be visualized using UCSC Genome Browser.

Let's investigate the position of the second peak that was detected at chr11 47557145 (peak P-2-1 within a region R-2, see below).

1. type "more ./QuEST_analysis/calls/peak_caller.ChIP.out.accepted" (see below)

We are interested in the second region containing peak "P-2-1" (see yellow arrow on the image below)

2. Open a web browser and type "genome.ucsc.edu" in the address field

3. Click on the "Genomes" tab

4. Navigate to hg18 genome assembly.

- choose "Mammal" for the clade, "Human" for the genome and "Mar. 2006" for the assembly.

You should see hg18 assembly choice indicated by the "About the Human Mar. 2006 (hg18) assembly" (see below)

4. Upload custom tracks

- click "add custom tracks" button

- when the File Upload menu comes up, navigate to "QuEST_analysis/tracks/wig_profiles/by_chr/ChIP_unnormalized" directory and select "chr11.wig.gz". Click on "Open" button of the menu. Then click on the "Submit" button of the genome browser.

You should see the following screen appear after a minute or so:

You can see that the "GABP_Jurkat_unnormalized_chr11" is now uploaded as a custom track into the genome browser

- click on the "go to genome browser" button

Usually you get taken to some arbitrary place in the genome, but the important part is that you can see that the track is present in your display (see red arrows)

5. Navigate to the locus of interest.

You can do this by copy-pasting the coordinate range from the region "R-2" into the Genome Browser's position/search field or by manually typing into the search field chr11:47,556,767-47,557,549 (don't forget to add the column after chr11!), hit the jump button.

You should now see the locus of interest:

You can see here that the binding site is at the promoter of NDUFS3 and that the peak position (47557145) indicates the likely binding site with the score at the peak achieving 54.5 (same as indicated in the QuEST peak call file, see below).

- Use a zoom button to zoom out 3x:

6. Upload the peak calling track, bar graph and the sequencing reads:

- click on the "manage custom tracks" button under the browser main display

- click on "add custom tracks" button

- add QuEST calls track "QuEST_analysis/tracks/ChIP_calls.filtered.bed"

- add the sequence read data track "QuEST_analysis/tracks/data_bed_files/by_chr/ChIP/GABP_Jurkat.chr11.bed.gz"

-add the sequence read bedgraph display "QuEST_analysis/tracks/bed_graph/by_chr/ChIP"

- you should now see all these tracks being uploaded into the browser:

- click on the "go to the genome browser" button and navigate to the locus "chr11:47,555,984-47,558,332"

- this is slightly busy, so disable the raw data display by choosing the "hide" option of the GABP_Jurkat_tags_chr11" track in the section under the main browser display area. hit "refresh" button:

You should now see a more condensed data and analysis display:

7. Add control data to the display

- click on "manage custom tracks" button underneath the main browser display

- click on "add custom tracks" button in the "manage custom tracks" display page

- Upload control data for chr11: select file "QuEST_analysis/tracks/data_bed_files/by_chr/RX_noIP, click "Open" then "Submit" button (see the screen shot below)

- Upload control data density track: click on "add custom tracks", select file "QuEST_analysis/tracks/wig_profiles/by_chr/background_unnormalized/chr11.wig.gz", hit "Open" and "Submit" button (see the screen shot below)

You should now see both track being uploaded in the genome browser:

click on "go to genome browser" button and navigate again to the locus "chr11:47555984-47558332" of the genome browser. You should see now control data displayed against the ChIP data at the same locus:

You can see that control data shows some light enrichment at the position of the peak (this is not an artifact and does happen under some experimental conditions). Despite the appearance that control density profile is as high as the ChIP density profile, you have to remember that by default the browser will display the data by resizing to the maximum of the wig value within the current view. If you look on the left side of the display, you can see that in fact ChIP data has a maximum of 54.5 and control is only at 0.218, in fact the normalized ratio of ChIP and control intensities at the position of the peak is over 300!

Notes:

1. QuEST outputs both the entire data tracks as well as split by chromosome. If you have a large data set, it's best to upload only chromosomes that you are interested in. UCSC genome browser does not accept large files, so you may fail to upload the entire data sets altogether, leaving you with only by-chromosome option.

2. UCSC Genome Browser sessions will expire after about 3 days, so if you want to keep the data alive on the browser, you have to "refresh" the current session every couple of days or so by doing something like zooming, browsing a bit, or simply hitting "refresh" button.

 

References:

QuEST:

Valouev* A, Johnson* DS, Sundquist A, Medina C, Anton E, Batzoglou S, Myers RM and Sidow A.
 “Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data.”
Nature Methods, 2008 Sep; 5(9):829-34 [ pdf ]

Last modified: Nov. 3, 2009