Summary: PSIBLAST Algorithm
1. Perform initial alignment with BLAST using BLOSUM 62 substitution matrix.
2. Construct a multiple alignment from hits.
3. Prepare a position specific scoring matrix (PSSM).
4. Use PSSM profile as the scoring matrix for a second BLAST (run against database).
5. Repeat steps 2-4 until convergence.
Constructing a Position Specific Scoring Matrix (PSSM)
Dimension of a PSSM: lq × 20, where lq is the length of the query protein.
1. Run BLAST against the database (local alignment).
2. Collect database sequence segments with E-value below threshold (default is 0.01).
3. Remove similar sequences.
(a) Remove sequence segments identical to a query segment.
(b) Retain one copy for any rows that are >98% identical to one another.
4. Construct the multiple alignment block M with the remaining segments (length M = lq ).
(a) Ignore pairwise alignment columns that involve gap characters inserted into the query.
5. For each column C:
(a) Reduce M to MC (1 C query length)
i. Let R be the set of sequences with a residue in C.
ii. Columns of MC are columns of M with all sequences in R. In other words, MC only contains
those database sequences in R. Therefore, MC contains a subset of M's columns and rows