VecFinder: Automated de novo identification and removal of vector and adapter sequence from genomic datasets
High-throughput Sanger sequencing requires DNA to be inserted into bacterial vectors for biological amplification. Adapter or linker oligonucleotides may also be attached to target DNA fragments to facilitate insertion into the vector. These vector and adapter sequences are sequenced concomitantly with the target, or insert, sequence and represent contamination which must be removed from the dataset prior to analysis. Removal of such contamination can be accomplished by screening the dataset against the known sequence of the vector and adapter used to generate the data. However, often in the case of public or collaborator datasets, information regarding these contaminant sequences may be incorrect or absent, resulting in an incomplete screening. We've created a piece of software, VecFinder, which is able to identify the sequence of the vector and adapter from the read sequences alone and subsequently remove it. This alleviates the dependence on the library creators to provide the vector and adapter sequences used for the library. It also automates the previously manual task of identifying and screening the adapter or linker sequence, which can be tedious and time-consuming
- Research Organization:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- DOE Contract Number:
- DE-AC02-05CH11231
- OSTI ID:
- 1170608
- Report Number(s):
- LBNL-6833E
- Resource Relation:
- Conference: The Biology of Genomes - Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, May 8-12, 2007
- Country of Publication:
- United States
- Language:
- English
Similar Records
Functional analysis of a novel KRAB/C{sub 2}H{sub 2} zinc finger protein Mipu1
Cloning and characterization of complementary DNA encoding the eukaryotic initiation factor-2 associated 67 kDa polypeptide (p[sup 67])