Download Human_g1k_v37_decoy.fasta Extra Quality Here
The human_g1k_v37_decoy.fasta file is a specific version of the human reference genome used extensively by the 1000 Genomes Project and the Genome Analysis Toolkit (GATK). It is often referred to as "b37" or the "decoy" reference because it includes additional "decoy" sequences designed to improve the accuracy of read mapping. Why Download the Decoy Version?
This creates five files ( .amb , .ann , .bwt , .pac , .sa ) required for the BWA-MEM aligner. samtools faidx human_g1k_v37_decoy.fasta Use code with caution. download human_g1k_v37_decoy.fasta
GATK hosts these files on public cloud storage. For example, legacy versions can be found in the GATK Resource Bundle via Google Cloud Storage at gs://gatk-best-practices/somatic-b37/ or Amazon S3. The human_g1k_v37_decoy
Standard human reference genomes (like the basic hg19 or GRCh37) are incomplete. They often lack sequence data for highly repetitive regions or centromeres. When researchers perform DNA sequencing, reads that actually belong to these missing regions might be incorrectly mapped to real genes, causing "false positive" variant calls. The human_g1k_v37_decoy version solves this by: This creates five files (
The pseudo-autosomal regions (PAR) on Chromosome Y are masked with "N" so that these regions are only treated as diploid on Chromosome X. Where to Download human_g1k_v37_decoy.fasta
The official 1000 Genomes FTP provides the sequence files under the technical/reference directory.
For those requiring specific Ensembl annotations, the Ensembl GRCh37 Archive provides stable access to the build. How to Use and Index the File