skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Document Set Differentiability Analyzer v. 0.1

Abstract

Software is a JMP Scripting Language (JSL) script designed to evaluate the differentiability of a set of documents that exhibit some conceptual commonalities but are expected to describe substantially different – thus differentiable – categories. The script imports the document set, a subset of which may be partitioned into an additions pool. The bulk of the documents form a basis pool. Text analysis is applied to the basis pool to extract a mathematical representation of its conceptual content, referred to as the document concept space. A bootstrapping approach is applied to that mathematical representation in order to generate a representation of a large population of randomly designed documents that could be written within the concept space, notably without actually writing the text of those documents.The Kolmogorov-Smirnov test is applied to determine whether the basis pool document set exhibits superior differentiation relative to the randomly designed virtual documents produced by bootstrapping. If an additions pool exists, the documents are incrementally added to the basis pool, choosing the best differentiated remaining document at each step. In this manner the impact of additional categories to overall document set differentiability may be assessed.The software was developed to assess the differentiability of job description documentmore » sets. Differentiability is key to meaningful categorization. Poor job differentiation may have economic, ethical, and/or legal implications for an organization. Job categories are used in the assignment of market-based salaries; consequently, poor differentiation of job duties may set the stage for legal challenges if very similar jobs pay differently depending on title, a circumstance that also invites economic waste.The software can be applied to ensure job description set differentiability, reducing legal, economic, and ethical risks to an organization and its people. The extraction of the conceptual space to a mathematical representation enables identification of exceedingly similar documents. In the event of redundancy, two jobs may be collapsed into one. If in the judgment of the subject matter experts the jobs are truly different, the conceptual similarities are highlighted, inviting inclusion of appropriate descriptive content to explicitly characterize those differences. When additional job categories may be needed as the organization changes, the software enables evaluation of proposed additions to ensure that the resulting document set remains adequately differentiated.« less

Authors:
 [1]
  1. Sandia National Laboratories
Publication Date:
Research Org.:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1344335
Report Number(s):
Document Set Differentiability Analyzer; 005165MLTPL00
SCR #2180
DOE Contract Number:  
AC04-94AL85000
Resource Type:
Software
Software Revision:
00
Software Package Number:
005165
Software CPU:
MLTPL
Source Code Available:
Yes
Other Software Info:
The script and associated document could be commercialized by an HR Analytics or salary survey firm to provide a competitive advantage in risk management tools or the composition of categorical assignments of data provided by respondents.
Related Software:
As provided the script runs within JMP Pro Version 13; however, the functional content could be readily adapted to other statistical tools (e.g., R) or directly written in other programming languages.
Country of Publication:
United States

Citation Formats

Osborn, Thor D. Document Set Differentiability Analyzer v. 0.1. Computer software. Vers. 00. USDOE. 8 Feb. 2017. Web.
Osborn, Thor D. (2017, February 8). Document Set Differentiability Analyzer v. 0.1 (Version 00) [Computer software].
Osborn, Thor D. Document Set Differentiability Analyzer v. 0.1. Computer software. Version 00. February 8, 2017.
@misc{osti_1344335,
title = {Document Set Differentiability Analyzer v. 0.1, Version 00},
author = {Osborn, Thor D.},
abstractNote = {Software is a JMP Scripting Language (JSL) script designed to evaluate the differentiability of a set of documents that exhibit some conceptual commonalities but are expected to describe substantially different – thus differentiable – categories. The script imports the document set, a subset of which may be partitioned into an additions pool. The bulk of the documents form a basis pool. Text analysis is applied to the basis pool to extract a mathematical representation of its conceptual content, referred to as the document concept space. A bootstrapping approach is applied to that mathematical representation in order to generate a representation of a large population of randomly designed documents that could be written within the concept space, notably without actually writing the text of those documents.The Kolmogorov-Smirnov test is applied to determine whether the basis pool document set exhibits superior differentiation relative to the randomly designed virtual documents produced by bootstrapping. If an additions pool exists, the documents are incrementally added to the basis pool, choosing the best differentiated remaining document at each step. In this manner the impact of additional categories to overall document set differentiability may be assessed.The software was developed to assess the differentiability of job description document sets. Differentiability is key to meaningful categorization. Poor job differentiation may have economic, ethical, and/or legal implications for an organization. Job categories are used in the assignment of market-based salaries; consequently, poor differentiation of job duties may set the stage for legal challenges if very similar jobs pay differently depending on title, a circumstance that also invites economic waste.The software can be applied to ensure job description set differentiability, reducing legal, economic, and ethical risks to an organization and its people. The extraction of the conceptual space to a mathematical representation enables identification of exceedingly similar documents. In the event of redundancy, two jobs may be collapsed into one. If in the judgment of the subject matter experts the jobs are truly different, the conceptual similarities are highlighted, inviting inclusion of appropriate descriptive content to explicitly characterize those differences. When additional job categories may be needed as the organization changes, the software enables evaluation of proposed additions to ensure that the resulting document set remains adequately differentiated.},
doi = {},
year = {Wed Feb 08 00:00:00 EST 2017},
month = {Wed Feb 08 00:00:00 EST 2017},
note =
}

Software:
To order this software, request consultation services, or receive further information, please fill out the following request.

Save / Share:

To initiate an order for this software, request consultation services, or receive further information, fill out the request form below. You may also reach us by email at: .

OSTI staff will begin to process an order for scientific and technical software once the payment and signed site license agreement are received. If the forms are not in order, OSTI will contact you. No further action will be taken until all required information and/or payment is received. Orders are usually processed within three to five business days.

Software Request

(required)
(required)
(required)
(required)
(required)
(required)
(required)
(required)