Learning Memento archive routing with Character-based Artificial Neural Networks
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
This white paper describes a series of tests that were performed to determine if a neural network could learn patterns from a service that maintains a cache of routing decisions for the discovery of version information for discrete web-at-large URLs. Labeled training data was derived from a log file that records the by-archive availability of Memento availability for a given URL. Training data sets were generated from this log file on a by-archive basis (thus making it a binary classification problem). Each training data set consisted of equal numbers of hit and miss URLs, selected at random. The URLs were converted to a normalized numeric representation where each integer in a URL training vector represents a character in that URL. The corresponding label indicates whether or not Mementos were available for that URL in the selected archive. The training data matrix and the label vector became input to a neural network. A number of neural network architectures and network hyperparameters were explored, however the log entries themselves were used as-is, without any feature engineering, beyond the aforementioned normalization.
- Research Organization:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC52-06NA25396
- OSTI ID:
- 1477616
- Report Number(s):
- LA-UR-18-29608
- Country of Publication:
- United States
- Language:
- English
Similar Records
Hyper Parameter Tuning in Neural Optical Image Categorizer for the E-log (NOICE
MABAL: a Novel Deep-Learning Architecture for Machine-Assisted Bone Age Labeling