skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Document Set Differentiability Analyzer v. 0.1

Abstract

Software is a JMP Scripting Language (JSL) script designed to evaluate the differentiability of a set of documents that exhibit some conceptual commonalities but are expected to describe substantially different – thus differentiable – categories. The script imports the document set, a subset of which may be partitioned into an additions pool. The bulk of the documents form a basis pool. Text analysis is applied to the basis pool to extract a mathematical representation of its conceptual content, referred to as the document concept space. A bootstrapping approach is applied to that mathematical representation in order to generate a representation of a large population of randomly designed documents that could be written within the concept space, notably without actually writing the text of those documents.The Kolmogorov-Smirnov test is applied to determine whether the basis pool document set exhibits superior differentiation relative to the randomly designed virtual documents produced by bootstrapping. If an additions pool exists, the documents are incrementally added to the basis pool, choosing the best differentiated remaining document at each step. In this manner the impact of additional categories to overall document set differentiability may be assessed.The software was developed to assess the differentiability of job description documentmore » sets. Differentiability is key to meaningful categorization. Poor job differentiation may have economic, ethical, and/or legal implications for an organization. Job categories are used in the assignment of market-based salaries; consequently, poor differentiation of job duties may set the stage for legal challenges if very similar jobs pay differently depending on title, a circumstance that also invites economic waste.The software can be applied to ensure job description set differentiability, reducing legal, economic, and ethical risks to an organization and its people. The extraction of the conceptual space to a mathematical representation enables identification of exceedingly similar documents. In the event of redundancy, two jobs may be collapsed into one. If in the judgment of the subject matter experts the jobs are truly different, the conceptual similarities are highlighted, inviting inclusion of appropriate descriptive content to explicitly characterize those differences. When additional job categories may be needed as the organization changes, the software enables evaluation of proposed additions to ensure that the resulting document set remains adequately differentiated.« less

Authors:
 [1]
  1. Sandia National Laboratories
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1344335
Report Number(s):
Document Set Differentiability Analyzer; 005165MLTPL00
SCR #2180
DOE Contract Number:
AC04-94AL85000
Resource Type:
Software
Software Revision:
00
Software Package Number:
005165
Software CPU:
MLTPL
Source Code Available:
Yes
Other Software Info:
The script and associated document could be commercialized by an HR Analytics or salary survey firm to provide a competitive advantage in risk management tools or the composition of categorical assignments of data provided by respondents.
Related Software:
As provided the script runs within JMP Pro Version 13; however, the functional content could be readily adapted to other statistical tools (e.g., R) or directly written in other programming languages.
Country of Publication:
United States

Citation Formats

Osborn, Thor D. Document Set Differentiability Analyzer v. 0.1. Computer software. Vers. 00. USDOE. 8 Feb. 2017. Web.
Osborn, Thor D. (2017, February 8). Document Set Differentiability Analyzer v. 0.1 (Version 00) [Computer software].
Osborn, Thor D. Document Set Differentiability Analyzer v. 0.1. Computer software. Version 00. February 8, 2017.
@misc{osti_1344335,
title = {Document Set Differentiability Analyzer v. 0.1, Version 00},
author = {Osborn, Thor D.},
abstractNote = {Software is a JMP Scripting Language (JSL) script designed to evaluate the differentiability of a set of documents that exhibit some conceptual commonalities but are expected to describe substantially different – thus differentiable – categories. The script imports the document set, a subset of which may be partitioned into an additions pool. The bulk of the documents form a basis pool. Text analysis is applied to the basis pool to extract a mathematical representation of its conceptual content, referred to as the document concept space. A bootstrapping approach is applied to that mathematical representation in order to generate a representation of a large population of randomly designed documents that could be written within the concept space, notably without actually writing the text of those documents.The Kolmogorov-Smirnov test is applied to determine whether the basis pool document set exhibits superior differentiation relative to the randomly designed virtual documents produced by bootstrapping. If an additions pool exists, the documents are incrementally added to the basis pool, choosing the best differentiated remaining document at each step. In this manner the impact of additional categories to overall document set differentiability may be assessed.The software was developed to assess the differentiability of job description document sets. Differentiability is key to meaningful categorization. Poor job differentiation may have economic, ethical, and/or legal implications for an organization. Job categories are used in the assignment of market-based salaries; consequently, poor differentiation of job duties may set the stage for legal challenges if very similar jobs pay differently depending on title, a circumstance that also invites economic waste.The software can be applied to ensure job description set differentiability, reducing legal, economic, and ethical risks to an organization and its people. The extraction of the conceptual space to a mathematical representation enables identification of exceedingly similar documents. In the event of redundancy, two jobs may be collapsed into one. If in the judgment of the subject matter experts the jobs are truly different, the conceptual similarities are highlighted, inviting inclusion of appropriate descriptive content to explicitly characterize those differences. When additional job categories may be needed as the organization changes, the software enables evaluation of proposed additions to ensure that the resulting document set remains adequately differentiated.},
doi = {},
year = 2017,
month = 2,
note =
}

Software:
To order this software, request consultation services, or receive further information, please fill out the following request.

Save / Share:
  • This document describes the software design for the Tank Monitor and Control System (TMACS). This document captures the existing as-built design of TMACS as of November 1999. It will be used as a reference document to the system maintainers who will be maintaining and modifying the TMACS functions as necessary. The heart of the TMACS system is the ''point-processing'' functionality where a sample value is received from the field sensors and the value is analyzed, logged, or alarmed as required. This Software Design Document focuses on the point-processing functions.
  • This document captures the existing as-built design of Tank Monitor and Control System (TMACS) and will be used as a reference document to the system maintainers who will be maintaining and modifying the TMACS functions as necessary. The heart of the TMACS system is the ''point-processing'' functionality where a sample value is received from the field sensors and the value is analyzed, logged, or alarmed as required. This Software Design Document focuses on the point-processing functions.
  • The Department of Energy Data Exchange Format (DOEDEF) Translator Module Pool A (hereafter referred to as the ''pool'') is intended to reduce the manpower required to rewrite the DOEDEF translator software already in existence throughout the Department of Energy Nuclear Weapons Complex (DOE NWC). Each portion of the DOEDEF exchange system and therefore each site's DOEDEF translator software must be rewritten during the transition from Phase 1.0 to Phase 1.5 of DOEDEF. By providing a pool, some of which (if not all) can be used by multiple NWC sites, the effort required to move from Phase 1.0 to Phase 1.5more » can be reduced.« less
  • This document is to serve as a guide and a tutorial for writing and evaluating Software Requirements Specifications (SRS). It describes the recommended content and qualities of a good Software Requirements Specification, and it presents a prototypical SRS organizational format for a large scale scientific software product. This format can be scaled down appropriately for smaller products. This guide is intended to serve both the software specifier and the software developer. For the former, the guide is to be used as an aid to describing what is supposed to be in the product; for the latter, it is and aidmore » to understanding what is desired. That is, this guide will serve as a common basis for understanding the content of a particular Software Requirements Specification.« less
  • This, then, is the current status of the project: Since we made the switch to Intradoc, we are now treating the project as a document and image management system. In reality, it could be considered a document and content management system since we can manage almost any file input to the system such as video or audio. At present, however, we are concentrating on images. As mentioned above, my CRADA funding was only targeted at including thumbnails of images in Intradoc. We still had to modify Intradoc so that it would compress images submitted to the system. All processing ofmore » files submitted to Intradoc is handled in what is called the Document Refinery. Even though MrSID created thumbnails in the process of compressing an image, work needed to be done to somehow build this capability into the Document Refinery. Therefore we made the decision to contract the Intradoc Engineering Team to perform this custom development work. To make Intradoc even more capable of handling images, we have also contracted for customization of the Document Refinery to accept Adobe PhotoShop and Illustrator file in their native format.« less

To initiate an order for this software, request consultation services, or receive further information, fill out the request form below. You may also reach us by email at: .

OSTI staff will begin to process an order for scientific and technical software once the payment and signed site license agreement are received. If the forms are not in order, OSTI will contact you. No further action will be taken until all required information and/or payment is received. Orders are usually processed within three to five business days.

Software Request

(required)
(required)
(required)
(required)
(required)
(required)
(required)
(required)