skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Deactivation and decommissioning web log analysis using big data technology - 15710

Abstract

The D and D Knowledge Management Information Tool (D and D KM- IT) is a web-based knowledge management information tool built for the D and D user community. Big data is a massive volume of structured and unstructured datasets which is so large that it's difficult to process using traditional database techniques. Web logs are the repository of files which are generated automatically for any operation on the web site. Web log files generated from the D and D KM-IT will be processed using the Apache Hadoop Framework to extract meaningful data. Hadoop is an open-source software framework for storing and processing big data in a distributed environment on large clusters of commodity hardware. Hadoop framework consists of two main layers: Hadoop distributed file system (HDFS) and the Execution engine (MapReduce). Automatic parallelization and distribution; clean and simple programming abstraction. Implementation: Web log files from the D and D KM-IT server are fetched using Apache Flume and loaded into the Hadoop distributed file system. Using the Apache Hcatalog tool, the web log data is parsed and converted into a structured table format. A Pig program is developed to query the tables to retrieve and store the significant information onto themore » HDFS. The extracted data is fed to business visualization tools such as Microsoft Excel for analysis and reporting. Results The reports generated through the web log data processing will be analyzed to improve the usability and performance of the D and D KM-IT. Conclusion/Future Work: User keyword information can be explored to add new trending topics to the D and D KM-IT application. Statistics generated can be compared to analysis reports from existing analytical tools.« less

Authors:
; ;  [1]
  1. Applied Research Center, Florida International University (United States)
Publication Date:
Research Org.:
WM Symposia, Inc., PO Box 27646, 85285-7646 Tempe, AZ (United States)
OSTI Identifier:
22824525
Report Number(s):
INIS-US-19-WM-15710
TRN: US19V1092069571
Resource Type:
Conference
Resource Relation:
Conference: WM2015: Annual Waste Management Symposium, Phoenix, AZ (United States), 15-19 Mar 2015; Other Information: Country of input: France; available online at: http://archive.wmsym.org/2015/index.html
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICAL METHODS AND COMPUTING; 12 MANAGEMENT OF RADIOACTIVE WASTES, AND NON-RADIOACTIVE WASTES FROM NUCLEAR FACILITIES; 96 KNOWLEDGE MANAGEMENT AND PRESERVATION; COMPUTER CODES; DATA PROCESSING; DATASETS; DEACTIVATION; DECOMMISSIONING; KNOWLEDGE MANAGEMENT; PROGRAMMING; STATISTICS

Citation Formats

Joshi, Santosh, Upadhyay, Himanshu, and Lagos, Leonel. Deactivation and decommissioning web log analysis using big data technology - 15710. United States: N. p., 2015. Web.
Joshi, Santosh, Upadhyay, Himanshu, & Lagos, Leonel. Deactivation and decommissioning web log analysis using big data technology - 15710. United States.
Joshi, Santosh, Upadhyay, Himanshu, and Lagos, Leonel. Wed . "Deactivation and decommissioning web log analysis using big data technology - 15710". United States.
@article{osti_22824525,
title = {Deactivation and decommissioning web log analysis using big data technology - 15710},
author = {Joshi, Santosh and Upadhyay, Himanshu and Lagos, Leonel},
abstractNote = {The D and D Knowledge Management Information Tool (D and D KM- IT) is a web-based knowledge management information tool built for the D and D user community. Big data is a massive volume of structured and unstructured datasets which is so large that it's difficult to process using traditional database techniques. Web logs are the repository of files which are generated automatically for any operation on the web site. Web log files generated from the D and D KM-IT will be processed using the Apache Hadoop Framework to extract meaningful data. Hadoop is an open-source software framework for storing and processing big data in a distributed environment on large clusters of commodity hardware. Hadoop framework consists of two main layers: Hadoop distributed file system (HDFS) and the Execution engine (MapReduce). Automatic parallelization and distribution; clean and simple programming abstraction. Implementation: Web log files from the D and D KM-IT server are fetched using Apache Flume and loaded into the Hadoop distributed file system. Using the Apache Hcatalog tool, the web log data is parsed and converted into a structured table format. A Pig program is developed to query the tables to retrieve and store the significant information onto the HDFS. The extracted data is fed to business visualization tools such as Microsoft Excel for analysis and reporting. Results The reports generated through the web log data processing will be analyzed to improve the usability and performance of the D and D KM-IT. Conclusion/Future Work: User keyword information can be explored to add new trending topics to the D and D KM-IT application. Statistics generated can be compared to analysis reports from existing analytical tools.},
doi = {},
url = {https://www.osti.gov/biblio/22824525}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2015},
month = {7}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: