Trace Crawler SOFTWARE

RESOURCE

Abstract

The trace crawler is a tool for selective web crawling to archive web resources with well-defined boundaries. The specific web navigation steps (or trace) are formulated for the families of webpages, where layout or HTML structure can be similar but the content is different, for example, GitHub, Slideshare, blogs, etc. The trace is recorded in a json file format.
Release Date:
2022-06-15
Project Type:
Open Source, Publicly Available Repository
Software Type:
Scientific
Licenses:
BSD 3-clause "New" or "Revised" License
Sponsoring Org.:
Code ID:
74911
Site Accession Number:
C22054
Research Org.:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Country of Origin:
United States

RESOURCE

Citation Formats

Balakireva, Lyudmila, and Klein, Martin. Trace Crawler SOFTWARE. Computer Software. https://github.com/lanl/trace-crawler. USDOE. 15 Jun. 2022. Web. doi:10.11578/dc.20220615.2.
Balakireva, Lyudmila, & Klein, Martin. (2022, June 15). Trace Crawler SOFTWARE. [Computer software]. https://github.com/lanl/trace-crawler. https://doi.org/10.11578/dc.20220615.2.
Balakireva, Lyudmila, and Klein, Martin. "Trace Crawler SOFTWARE." Computer software. June 15, 2022. https://github.com/lanl/trace-crawler. https://doi.org/10.11578/dc.20220615.2.
@misc{ doecode_74911,
title = {Trace Crawler SOFTWARE},
author = {Balakireva, Lyudmila and Klein, Martin},
abstractNote = {The trace crawler is a tool for selective web crawling to archive web resources with well-defined boundaries. The specific web navigation steps (or trace) are formulated for the families of webpages, where layout or HTML structure can be similar but the content is different, for example, GitHub, Slideshare, blogs, etc. The trace is recorded in a json file format.},
doi = {10.11578/dc.20220615.2},
url = {https://doi.org/10.11578/dc.20220615.2},
howpublished = {[Computer Software] \url{https://doi.org/10.11578/dc.20220615.2}},
year = {2022},
month = {jun}
}