I'm currently working on trying to analyze a dataset. I'm new to the field of bioinformatics and was trying to use BWA tools, however, as soon as I reach bwa mem, I keep running into the same error:
input --> mirues-macbook:sra ipmiruek$ bwa mem -t 8 Homo_sapiens.GRCh38.dna.chromosome.17.fa ERR3841737/ERR3841737_trimmed.fq.gz > ERR3841737/ERR3841737_mapped.sam
output --> [E::bwa_idx_load_from_disk] fail to locate the index files
I've already indexed the reference chromosome as such:
bwa index Homo_sapiens.GRCh38.dna.chromosome.17.fa.gz
Is there anything I could do to fix this problem? Thank you.
I tried changing the dataset that I was using along with the corresponding reference chromosome but it still yielded the same result. Is this an issue with the code or with the dataset I'm working with?
CodePudding user response:
It looks like you indexed a gzip-compressed FASTA file, but are supplying an index base (idxbase) without the .gz
extenstion. What you want is:
$ bwa mem \
-t 8 \
Homo_sapiens.GRCh38.dna.chromosome.17.fa.gz \
ERR3841737/ERR3841737_trimmed.fq.gz \
> ERR3841737/ERR3841737_mapped.sam
Alternatively, gunzip
the reference FASTA file and index it. For example:
$ gunzip Homo_sapiens.GRCh38.dna.chromosome.17.fa.gz
$ bwa index Homo_sapiens.GRCh38.dna.chromosome.17.fa
Note that BWA packs the reference sequences (into the .pac
file), so you don't even need the FASTA file to run BWA MEM after it's been indexed.