For each of these questions, show the commands you would use to answer the question, If the answer can be shown in less than 5 lines, show the answer as well.
Create a BED file representing all of the intervals in the genome that are NOT exonic.
What is the average distance from GWAS SNPs to the closest exon? (Hint - have a look at the closest
tool.)
Count how many exons occur in each 500kb interval (“window”) in the human genome, what is the average value? (Hint - have a look at the makewindows
tool.)
Are there any exons that are completely overlapped by an enhancer? If so, how many?
What fraction of the GWAS SNPs are exonic?
What fraction of the GWAS SNPs lie in either enhancers or promoters in the hESC data we have?
Create intervals representing the canonical 2bp splice sites on either side of each exon (don’t worry about excluding splice sites at the first or last exon). (Hint - have a look at the flank tool.)
Which hESC ChromHMM state (e.g., 11_Weak_Txn
, 10_Txn_Elongation
) represents the most number of base pairs in each of chromosome 19 and chromosome 8?