- CS109A Notes for Lecture 2/26/96 Rooted Trees
- Solutions to Chapter 1. Computer Science: The Mechanization
- CS109A Solutions for Class Quiz #1 February 8, 1995
- Anand Rajaraman Kosmix, Inc.
- Large-Scale File Systems and Map-Reduce
- Finding Similar Items A fundamental data-mining problem is to examine data for "similar" items. We
- Link Analysis One of the biggest changes in our lives in the decade following the turn of
- Clustering is the process of examining a collection of "points," and grouping the points into "clusters" according to some distance measure. The goal is that
- Advertising on the Web One of the big surprises of the 21st century has been the ability of all sorts of
- A-Priori Algorithm, 194, 195, 201 Accessible page, 169
- Computer Science: The Mechanization
- The power of computers comes from their ability to execute the same task, or different versions of the same task, repeatedly. In computing, the theme of iteration
- Combinatorics Probability
- There are many situations in which information has a hierarchical or nested struc-ture like that found in family trees or organization charts. The abstraction that
- The set is the most fundamental data model of mathematics. Every concept in mathematics, from trees to real numbers, is expressible as a special kind of set.
- Regular Expressions A pattern is a set of objects with some recognizable property. One type of pattern
- Abstract data type 261, 307 See also Dictionary, Priority queue,
- 4.1 Review of Object-Oriented Concepts Before introducing object-oriented database models, let us review the major
- 10.3 Recursive Programming in Datalog While relational algebra can express many useful operations on relations, there
- CS109A Notes for Lecture 2/7/96 Analysis of Mergesort
- CS109A Notes for Lecture 2/14{16/96 Orders With Some Equivalent Items
- CS109A Notes for Lecture 2/28/96 Binary Trees
- CS109A Notes for Lecture 3/11/96 In ML notation, x is a substring of y if y = u^x^v
- CS109A Notes for Lecture 2/27/95 De ned by membership relation 2.
- CS109A Notes for Lecture 3/1/95 Representing Sets
- CS109A Notes for Lecture 3/8/95 Properties of Binary Relations
- CS109A Notes for Lecture 3/13/95 Database Relations
- CS109A Notes for Lecture 3/17/95 Algebra of Relations
- CS109B Notes for Lecture 4/7/95 Connected Components
- CS109B Notes for Lecture 4/10/95 Depth-First Search
- CS109B Notes for Lecture 4/12/95 Single-Source Shortest Paths
- CS109B Notes for Lecture 4/24/95 Regular Expressions in UNIX
- CS109B Notes for Lecture 4/26/95 From RE's to Automata
- CS109B Notes for Lecture 5/19/95 Why Resolution?
- Handout #46 1995:May:26
- CS109A Notes for Lecture 2/21/96 Type Shorthands
- CS109A ML Notes for 3/6/96 Maintaining a State
- 1 What Is Data Mining? Originally, data mining" was a statistician's term for overusing data to draw invalid inferences.
- 1 What Is Data Mining? Originally, data mining" was a statistician's term for overusing data to draw invalid inferences.
- 3 Low-Support, High-Correlation Mining We continue to assume a market-basket" model for data, and we visualize the data as a boolean matrix,
- Evaluating the Web Hubs and Authorities
- 9 Sequence Matching Sequences are lists of values S = x1;x2;
- 10 Mining Episodes In the episode model, the data is a history of events; each event has a type and a time of occurrence. An
- What is Database Theory? A collection of studies, often connected to the
- Semantics of Datalog With Local Stratification
- Conjunctive Queries Containment Mappings
- Information Integration Semistructured Data
- Using Views to Implement Datalog Programs
- CS109A Notes for Lecture 1/29/96 An exception is the only thing that a function can
- Mining Data Streams Most of the algorithms described in this book assume that we are mining a
- Data Mining In this intoductory chapter we begin with the essence of data mining and a dis-
- CS109B Notes for Lecture 5/12/95 Notational Shift
- ffl A system developed by Walter Sujansky for transporting queries from one medical DB to
- 6 Mining the Web 1. Dynamic itemset counting : Searching for interesting sets of items in a space too large ever to consider
- CS109A Quiz Monday, Oct. 29, 10--10:50 am
- 5 Web Search 1. Page rank, for discovering the most ``important'' pages on the Web, as used in Google.
- A COMPARISON BETWEEN DEDUCTIVE AND OBJECTORIENTED DATABASE SYSTEMS
- CS109B Quiz Monday, Feb. 11, 10--10:50 am
- CS109 PROJECT: POLYNOMIALS; MEASURING RUNNING TIME Due Friday, Feb. 28
- CS 109A Project #2 Due March 17, 1995
- 4 Query Flocks Goal: apply apriori trick and other associationrule tricks to a more general class of complex queries.
- 6 Mining the Web 1. Dynamic itemset counting: Searching for interesting sets of items in a space too large ever to consider
- What We Know How to Do Given sources that are described by views over
- The Database Approach to Knowledge Representation Jeffrey D. Ullman
- Answering Queries Using Templates With Binding Patterns (Extended Abstract)
- CS109A Notes for Lecture 3/10/95 Why Study In nite Sets?
- CS109B Notes for Lecture 5/3/95 Parse Trees
- CS109B ML Notes for the Week of 4/17/95 Record Types
- CS109A Notes for Lecture 2/12/96 Assignments With Replacements
- 4 Query Flocks Goal: apply a-priori trick and other association-rule tricks to a more general class of complex queries.
- CS109A Class Quiz Friday, Feb. 2, 1996, 3:15--4:05PM
- Answering Queries Using Limited External Query Processors \Lambda
- CS109A Second Class Quiz Friday, March 3, 1995, 3:15--4:05PM
- Index Selection for OLAP \Lambda Himanshu Gupta Venky Harinarayan y Anand Rajaraman
- CS345A Midterm Solutions Problem 1: (40 points) Consider the following rules
- CS345A Notes for Lecture 5/4/94 The NAIL System
- CS109B Notes for Lecture 5/31/95 Essentially Boolean-valued functions with argu-
- TABLE OF CONTENTS v 3 Table of Contents
- CS109A Notes for Lecture 3/8/96 Data Structures
- CS109A Notes for Lecture 3/15/95 Primary Index
- Propositional In this chapter, we introduce propositional logic, an algebra whose original purpose,
- CS109A Notes for Lecture 1/31/96 Measuring the Running Time of Programs
- CS109B Notes for Lecture 4/17/95 NP-Complete Problems
- CS109A Programming Project #1: Happy Medians Due Friday, February 17, 1995
- Information Integration Using Logical Views ? Jeffrey D. Ullman
- CS109A Notes for Lecture 1/22/96 1. Keyword fun.
- Extended Conjunctive Queries Containment of Unions of CQ's
- Assigning an Appropriate Meaning Database Logic with Negation
- CS109A Quiz Monday, Nov. 26, 10--10:50 am
- CS109B PROJECT: A HARD GRAPH PROBLEM Due Friday, March 8
- CS109B Quiz Friday, May 3, 2:15--3:05 PM
- More Stream-Mining Counting How Many Elements
- CS109 PROJECT: MEASURING RUNNING TIME Due Wednesday, Dec. 5
- Implementing Data Cubes Efficiently \Lambda Venky Harinarayan Anand Rajaraman
- 5 Web Search 1. Page rank, for discovering the most important" pages on the Web, as used in Google.
- Like trees, lists are among the most basic of data models used in computer programs. Lists are, in a sense, simple forms of trees, because one can think of a list as a binary
- CS109B Notes for Lecture 5/24/95 No, not Bill | we mean a circuit element that
- CS109B Class Quiz Monday, May 1, 1995, 2:15--3:05PM
- 10 Mining Episodes In the episode model, the data is a history of events; each event has a type and a time of occurrence. An
- Finding Interesting Associations without Support Pruning Edith Cohen \Lambda Mayur Datar y Shinji Fujiwara z Aristides Gionis x Piotr Indyk --
- CS109B Notes for Lecture 4/19/95 Systems often may be modeled by a nite set of
- CS109B Project 1: Decoding the Genome Due Monday, May 8, 1995
- CQ's With Negation General form of conjunctive query with negation
- A Query Translation Scheme for Rapid Implementation of Wrappers (Extended Version) \Lambda
- CS345 Notes for Lecture 11/27/96 Developed by Raghu Ramakrishnan at Wisconsin.
- 3 LowSupport, HighCorrelation Mining We continue to assume a ``marketbasket'' model for data, and we visualize the data as a boolean matrix,
- ML has operators that apply to Boolean values. These operators are similar to the operators AND, OR, and NOT found in Pascal (&&, ------, and ! in C), but with
- CS109B PROJECT: A ``NATURAL LANGUAGE'' QUERY SYSTEM Due Friday, May 22
- What is Database Theory? A collection of studies, often connected to the
- ffl Mediatorbased integration. ffl Unlike IM, there is no notion of universal
- CS109A Solutions for Class Quiz #2 March 8, 1995
- Searching for Solutions Careful Analysis of Expansions
- CS109B Notes for Lecture 4/14/95 Floyd's Algorithm
- CS109B ML Notes for the Week of 5/22/95 Goal: complete concealment of the values of a
- Errata in the First Printing The following people deserve our thanks for pointing out errors in the first printing
- Constraint Checking with Partial Information \Lambda Extended Abstract
- CS109B Notes for Lecture 5/10/95 Models reasoning, mathematical proofs, hu-
- CS109A Notes for Lecture 2/23/96 Probability Space
- CS109A Notes for Lecture 2/9/96 Curried Functions
- 8 More About Clustering We continue our discussion of large-scale clustering algorithms, covering
- CS109A Notes for Lecture 1/24/96 Proving Recursive Programs Work
- A graph is, in a sense, nothing more than a binary relation. However, it has a powerful visualization as a set of points (called nodes) connected by lines (called
- 2 Association Rules and Frequent Itemsets The marketbasket problem assumes we have some large number of items, e.g., ``bread,'' ``milk.'' Customers
- CS109B Project 2: Instant Insanity and Generalizations Due Wednesday, June 7, 1995
- 1 What Is Data Mining? Originally, ``data mining'' was a statistician's term for overusing data to draw invalid inferences.
- 9 Sequence Matching Sequences are lists of values S = (x 1 ; x 2 ; : : : ; x k ), although we shall often think of the same sequence as a
- CS109B Notes for Lecture 5/17/95 How to Prove Things
- The Role of Theory Today Jeffrey D. Ullman
- Dynamic MissCounting Algorithms: Finding Implication and Similarity Rules with Confidence Pruning
- 8 More About Clustering We continue our discussion of largescale clustering algorithms, covering
- Anand Rajaraman Kosmix, Inc.
- CS109B Notes for Lecture 5/5/95 Recursive-Descent Parsing
- CS109A Notes for Lecture 2/5/96 Programs with Function Calls
- CS109A Notes for Lecture 3/6/95 Cartesian Product
- Frequent Itemsets We turn in this chapter to one of the major families of techniques for character-
- CS109B Notes for Lecture 4/28/95 Why Grammars?
- 1 Clustering Given points in some space | often a high-dimensional space | group the points into a small number of
- CS109B Second Class Quiz Monday, May 22, 1995, 2:15--3:05PM
- Description of Patterns
- Recommendation Systems There is an extensive class of Web applications that involve predicting user
- Querying Semistructured Heterogeneous Information \Lambda Dallan Quass, Anand Rajaraman, Yehoshua Sagiv y , Jeffrey Ullman, Jennifer Widom
- 1 Clustering Given points in some space ---often a highdimensional space ---group the points into a small number of
- The Relational One of the most important applications for computers is storing and managing
- CS109A Notes for Lecture 1/12/96 The Essence of Proof
- Review of Logical If-Then Rules h(X,...) :-a(Y,...) & b(Z,...) & ...
- CS109A Notes for Lecture 1/26/96 Running Time
- 19.2 View Serializability Recall our discussion in Section ?? of how our true goal in the design of a
- CS109A Notes for Lecture 3/4/96 Priority Queues
- CS109B Notes for Lecture 5/8/95 Expressive Power of Languages
- Mining Data Streams The Stream Model
- CS109B Notes for Lecture 6/7/95 Unsolvable Problems
- CS109A Notes for Lecture 1/19/96 Recursive De nition of Expressions
- Chapter 1. Computer Science: The Mechanization of Abstraction
- CS109B Quiz Monday, Feb. 11, 10--10:50 am
- CS109A Class Quiz Friday, Feb. 3, 1995, 3:15--4:05PM
- Efficient Implementation of Data Cubes Via Materialized Views Jeffrey D. Ullman
- CS109A Notes for Lecture 3/15/96 Representing Strings
- Still More Stream-Mining Frequent Itemsets
- The Running Time of Programs
- This book was motivated by the desire we and others have had to further the evolu-tion of the core course in computer science. Many departments across the country
- Low-Support, High-Correlation Finding Rare but Similar Items
- CS109B ML Notes for the Week of 5/8/95 ML's way of encapsulating concepts such as a data
- Computing Iceberg Queries Efficiently \Lambda Min Fang, Narayanan Shivakumar, Hector GarciaMolina, Rajeev Motwani, Jeffrey D. Ullman
- 1 What Is Data Mining? Originally, ``data mining'' was a statistician's term for overusing data to draw invalid inferences.
- Querying Semistructured Heterogeneous Information ?
- Integrating Information by Outerjoins and Full Disjunctions
- CS109B Notes for Lecture 5/15/95 Tautologies
- CS109B Notes for Lecture 6/2/95 Why Interpretations?
- Paper number 146 Scalable Techniques for Mining Causal Structures
- A Survey of Research on Deductive Database Systems Raghu Ramakrishnan
- Database Research: Achievements and Opportunities Into the 21st Century
- Representative Objects: Concise Representations of Semistructured, Hierarchical Data
- We now turn our attention to a generalization of propositional logic, called "predi-cate," or "first-order," logic. Predicates are functions of zero or more variables that
- CS109B Notes for Lecture 4/5/95 Nodes + edges = undirected graph.
- More Clustering CURE Algorithm
- CS109A Notes for Lecture 1/10/96 Major Theme: Data Models
- CS109A ML Notes for the Week of 1/16/96 ML can be used as an interactive language. We
- CS345 Notes for Lecture 11/25/96 NAIL (Not Another Implementation of Logic) was
- Hash-Based Improvements to Park-Chen-Yu Algorithm
- 2 Association Rules and Frequent Itemsets The market-basket problem assumes we have some large number of items, e.g., bread," milk." Customers
- Using Logic Computer Components
- CS109B Notes for Lecture 6/5/95 Why Tautologies Again?
- CS109A Notes for Lecture 1/17/96 Simple Inductions
- Distance Measures Hierarchical Clustering
- CS345 ---Data Mining Introductions
- More Clustering CURE Algorithm
- CS109A Class Quiz Friday, March 1, 1996, 3:15--4:05PM
- Also distributed electronically is the Standard ML of New Jersey Manual by A. W. Appel, D. B. MacQueen, et al., ATT Bell Laboratories, 1993. This
- CS109B Notes for Lecture 4/21/95 Nondeterministic Automata Looking for