How to install BLAST on mac and download the database
BLAST (Basic Local Alignment Search Tool) is a widely used tool for comparing nucleotide or protein sequences against databases. If you’re working on macOS and need to use any of its tools, this guide will help you install BLAST, download necessary databases, and run BLAST searches efficiently.
Installing
Manually
Installing BLAST manually gives you control over the installation process and allows you to install the latest version directly from NCBI.
Download the BLAST Package:
- Visit the NCBI BLAST download page: NCBI BLAST+ Download
- Download the latest macOS binary package
- For intel based mac typically the file name should be like
ncbi-blast-x.xx.x+-x64-macosx.tar.gz
. - For apple silicone based mac typically the file name should be like
ncbi-blast-x.xx.x+-aarch64-macosx.tar.gz
.
- For intel based mac typically the file name should be like
with x.xx.x
as the version. As the time of writing this guide the latest version is 2.16.0
.
cd ~/Downloads
curl -O https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.13.0+-x64-macosx.tar.gz
Note: Replace the URL with correct one for your platform.
- Extract the Package:
Open Terminal and navigate to the download directory:
cd ~/Downloads
tar -zxvf ncbi-blast-2.16.0+-x64-macosx.tar.gz
- You can now use BLAST binaries in the
ncbi-blast-2.16.0+/bin
directory.
cd ncbi-blast-2.16.0+/bin
./blastn -version
If everything is working correctly, you should see the BLAST version information.
blastn: 2.16.0+
Package: blast 2.16.0, build Jun 25 2024 11:53:51
- Making BLAST available anywhere
Right now if you want to use BLAST, you have to be in the ncbi-blast-2.16.0+/bin
directory. To make it accessible anywhere, there are two solutions. Either move the binaries into where your shell ( terminal ) searches for binaries, or tell it to look somewhere else (where BLAST is installed).
I normally prefer the second optione because
- It gives me more flexibility on where I can install and manage third party tools
- I can have multiple version of BLAST and switch between them in one command
The way to accomplish this is by adding the directory where BLAST is installed to the PATH
environment variable in macOS. The PATH
variable simply contains a list of directories that macos uses to seacrh for binaries.
To add a directory into this variable, simply locate where BLAST is installed, for my setup, I keep such tools in ~/third_party/blast/<version>/bin
so for version 2.16.0
, the full directory path would be ~/third_party/blast/2.16.0/bin
export PATH="$HOME/third_party/blast/2.16.0/bin:$PATH"
Note: This applies only to your current SHELL session, if you open a new terminal, you’ll have to execute this command again to make BLAST accessbile for this session.
To make it persistant across sessions, you’ll have to add the command to ~/.bashrc
or ~/.zshrc
.
- Verify Installation:
blastn -version
You should see the BLAST version information.
Via Homebrew
Homebrew simplifies software installation on macOS.
- Install Homebrew (if not already installed):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Install BLAST via Homebrew:
brew install blast
- Verify Installation:
blastn -version
Downloading the Database
BLAST requires databases to compare your query sequences against. Their database is available here.
Choose a Database:
- Common databases:
nt
: Nucleotide collection. It is used for nucleotide sequence searches, so if you have a DNA or RNA sequence and want to find similar nucleotide sequences, you would use the nt database.nr
: Non-redundant protein sequences. It is used for protein sequence searches, so if you have a protein sequence and want to find similar proteins, you would use the nr database.- Specific organism databases
- Common databases:
Create a Directory for Databases:
mkdir -p ~/blastdb
- Download Databases Using
update_blastdb.pl
:
You can download the database manually from the link above then extract them. However, BLAST provides a script to automatically do it.
The script is called update_blastdb.pl
and is included in the ncbi-blast-2.16.0+/bin
directory.
The update_blastdb.pl
script is included with BLAST. Since you made the full bin
directory available in your PATH
, you can also run this script from anywhere.
update_blastdb.pl --decompress nt ~/blastdb # for nt database
update_blastdb.pl --decompress nr ~/blastdb # for nr database
Note: Replace nt
with the database of your choice.
The problem is that the database is huge and you probably shouldn’t download all of it in your mac if you have another option. Normally I have a small database locally that I use to test scripts and when i’m confident with the script, I run it on the server where the full database is available.
- Download a partial database
You can download only one part of the database by specifying the part you want to download. For example, to download the first file of the
nt
ornr
database, you can run the following command:
update_blastdb.pl --decompress nt.00 ~/blastdb # for nt database
or using curl
curl -O https://ftp.ncbi.nlm.nih.gov/blast/db/nr.00.tar.gz
tar -zxvf nr.00.tar.gz
Note: This file is 33GB in size before uncompressing, so make sure you have enough space on your mac.
- Set the
BLASTDB
Environment Variable: Like the terminal needs to know where to search for binaries, BLAST needs to know where to search for databases. You can set theBLASTDB
environment variable to the directory where you downloaded the databases.
export BLASTDB=~/blastdb
This also applies only to your current SHELL session, if you open a new terminal, you’ll have to execute this command again to make BLASTDB accessbile to BLAST. To make it persistant across session, you’ll have to add the command to ~/.bashrc
or ~/.zshrc
.
- Verify Database Files:
ls ~/blastdb
Running BLAST
With BLAST installed and databases downloaded, you can perform searches.
- Prepare Your Query Sequence File:
Save your sequence in FASTA format (e.g., query.fasta
).
You can try the below sequence as a test.
>Seq1
ATGGGTAAGGAGGACAAGACTCACCTTAACGT
CGTCGTCATCGGCCACGTCGACTCTGGCAAGT
CGACCACTGTAAGTACAACCAACAGCGGGTTG
CTTATCTGCACTCGGAATCCGCCAAACCTGGC
AGGGTATCACCAAAACATCTTGCTAACTTTTG
ACAGACCGGTCACTTGATCTACCAGTGCGGTG
GTATCGACAAGCGAACCATCGAGAAGTTCGAG
AAGGTTAGTCAATATCCCTTCGATTACGCGCG
CTCCCATCGATTCCCACGATTCGCTCCCTCAC
TCGAAACACATCCATTACCCCGCTCGAGTCCG
AAAATTTTGCGGTGCGACCGTGATTTTTTCTG
GTGGGGTATCTTACCCCGCCACTCGAGTCACG
GATGCGCTTGCCCTGTTCCCACAAAACCTTAC
CACCCTGTCGCGCACTACATGTCTTGCAGTCA
CTAACCACTGGACAATAGGAAGCCGCCGAGCT
CGGAAAGGGTTCCTTCAAGTACGCCTGGGTTC
TTGACAAGCTCAAAGCCGAGCGTGAGCGTGGT
ATCACCATTGATATCGCTCTCTGGAAGTTCGA
GACTCCTCGCTACTATGTCACCGTCATTGGTA
TGTTGTCACCGTCTCACACTATCATGTATTCA
TCATGCTAACATCTCTCTCAGATGCCCCCGGT
CATCGTGATTTCATCAAGAACATGATC
- Run a BLAST Search:
blastn -query query.fasta -db nt -out results.txt
blastn
: Nucleotide-nucleotide BLAST-query
: Input file-db
: Database name-out
: Output file
- Customize Search Parameters (Optional):
blastn -query query.fasta -db nt -out results.txt -evalue 1e-5 -outfmt 6 -max_target_seqs 10
-evalue
: Expectation value threshold-outfmt 6
: Tabular output format-max_target_seqs
: Maximum number of aligned sequences to keep
Note: if you are getting any error related to the database, make sure that:
BLASTDB
environment variable is set correctly.- The database is downloaded and decompressed correctly.
- The database is in the correct directory.
- You can run
blastdbcmd -db nt -info
to check if the database is correctly installed.- Make sure you are using the correct database name in the command.
- Make sure you are using the correct command.
blastn
for nucleotide-nucleotide BLASTblastp
for protein-protein BLAST
- View Results:
less results.txt
Conclusion
Installing BLAST on macOS is straightforward, whether you choose a manual installation or use Homebrew. With BLAST set up and databases downloaded, you’re ready to perform powerful sequence analyses directly from your Mac. In the next guide, I’ll go over how to interpret the results from BLAST.