Chapter
2.2.1 XPath and XQuery Data Model for Documents
2.2.2 The XQuery Model (Continued) and Sequences
2.2.3 Specifying Paths in a Tree: XPath
2.2.4 A First Glance at XQuery Expressions
2.3.1 Steps and Path Expressions
2.3.2 Evaluation of Path Expressions
2.3.3 Generalities on Axes and Node Tests
2.3.5 Node Tests and Abbreviations
Converting a Node Set to a String
2.4 FLWOR EXPRESSIONS IN XQUERY
2.4.1 Defining Variables: The for and let Clauses
2.4.2 Filtering: The Where Clause
2.4.4 Advanced Features of XQuery
2.5.1 A Relational View of an XML Tree
2.5.4 Expressiveness and First-Order Logic
2.5.5 Other XPath Fragments
Dynamic and Static Typing
3.2.2 Automata on Ranked Trees
3.2.4 Trees and Monadic Second-Order Logic
3.3 SCHEMA LANGUAGES FOR XML
3.3.1 Document Type Definitions
3.3.3 Other Schema Languages for XML
3.4.1 Graph Semistructured Data
Schema Inference and Static Typing
4.1 FRAGMENTING XML DOCUMENTS ON DISK
4.2.1 Region-Based Identifiers
4.2.2 Dewey-Based Identifiers
4.2.3 Structural Identifiers and Updates
4.3 XML QUERY EVALUATION TECHNIQUES
4.3.2 Optimizing Structural Join Queries
4.3.3 Holistic Twig Joins
5 Putting into Practice: Managing an XML Database with EXIST
5.3 GETTING STARTED WITH E X IST
5.4 RUNNING XPATH AND XQUERY QUERIESWITH THE SANDBOX
5.4.3 Complement: XPath and XQuery Operators and Functions
5.5 PROGRAMMING WITH E X IST
5.5.1 Using the XML:DB API with EXIST
5.5.2 Accessing EXIST with Web Services
5.6.2 Shakespeare Opera Omnia
6 Putting into Practice: Tree Pattern Evaluation Using SAX
6.1 TREE-PATTERN DIALECTS
6.3 EXTENSIONS TO RICHER TREE PATTERNS
PART 2: Web Data Semantics and Integration
7 Ontologies, RDF, and OWL
7.2 ONTOLOGIES BY EXAMPLE
7.3.1 Web Resources, URI, Namespaces
Expressing class disjointness constraints
Class and Property Equivalence
Intentional Class Definitions
Class and Property Equivalence
7.4 ONTOLOGIES AND (DESCRIPTION) LOGICS
7.4.1 Preliminaries: The DL Jargon
Reasoning Problems Considered in DLs
7.4.2 ALC: The Prototypical DL
7.4.3 Simple DLs for Which Reasoning Is Polynomial
7.4.4 The DL-LITE Family: A Good Trade-off
8 Querying Data Through Ontologies
8.2 QUERYING RDF DATA: NOTATION AND SEMANTICS
8.3 QUERYING THROUGH RDFS ONTOLOGIES
8.4 ANSWERING QUERIES THROUGH DL-L ITE ONTOLOGIES
8.4.2 Consistency Checking
8.4.3 Answer Set Evaluation
8.4.4 Impact of Combining DL-LITE and DL-LITE on Query Answering
9.2 CONTAINMENT OF CONJUNCTIVE QUERIES
9.3 GLOBAL-AS-VIEW MEDIATION
9.4 LOCAL-AS-VIEW MEDIATION
9.4.1 The Bucket Algorithm
Construction of Candidate Rewritings
9.4.2 The Minicon Algorithm
First Step of Minicon: Creation of MCDs
Second Step of Minicon: Combination of the MCDs
9.4.3 The Inverse Rules Algorithm
9.5 ONTOLOGY-BASED MEDIATORS
9.5.1 Adding Functionality Constraints
9.5.2 Query Rewriting Using Views in DL-LITE
9.6 PEER-TO-PEER DATA MANAGEMENT SYSTEMS
9.6.1 Answering Queries Using GLAV Mappings Is Undecidable
Reduction from a Decision Problem B to a Decision Problem B'
The Dependency Implication Problem
Undecidability of the GLAV Query Answering Problem
9.6.2 Decentralized DL-LITE
10 Putting into Practice: Wrappers and Data Extraction with XSLT
10.1 EXTRACTING DATA FROM WEB PAGES
11 Putting into Practice: Ontologies inPractice (by Fabian M. Suchanek)
11.1 EXPLORING AND INSTALLING YAGO
11.3 WEB ACCESS TO ONTOLOGIES
12 Putting into Practice: Mashups with YAHOO! PIPES and XProc
12.1 YAHOO! PIPES: A GRAPHICAL MASHUP EDITOR
12.2 XPROC: AN XML PIPELINE LANGUAGE
PART 3: Building Web Scale Applications
13.2.2 Text Preprocessing
13.3 WEB INFORMATION RETRIEVAL
Content of Inverted Lists
Assessing Document Relevance
13.3.2 Answering Keyword Queries
Ranked Queries: Basic Algorithm
Fagin’s Threshold Algorithm
13.3.3 Large-Scale Indexing with Inverted Files
Performance of Inverted Files
Building and Updating an Inverted File
Indexing Dynamic Collections
Compression of Inverted Lists
13.3.5 Beyond Classical IR
13.4.4 Discovering Communities on the Web
13.5 HOT TOPICS IN WEB SEARCH
The Deep Web and Information Extraction
14 An Introduction to Distributed Systems
14.1 BASICS OF DISTRIBUTED SYSTEMS
14.1.1 Networking Infrastructures
14.1.2 Performance of a Distributed Storage System
14.1.3 Data Replication and Consistency
14.2.2 Distributed Transactions
14.3 REQUIRED PROPERTIES OF A DISTRIBUTED SYSTEM
14.3.5 Putting Everything Together: The CAP Theorem
14.4 PARTICULARITIES OF P2P NETWORKS
14.5 CASE STUDY: A DISTRIBUTED FILE SYSTEM FOR VERY LARGE FILES
14.5.1 Large-Scale File System
15 Distributed Access Structures
15.1 HASH-BASED STRUCTURES
Location of the Hash Directory
15.1.1 Distributed Linear Hashing
Distributed Linear Hashing
Reducing Maintenance Cost by Lazy Adjustment
Details on the LH* Algorithms
15.1.2 Consistent Hashing
Distributing Data with Consistent Hashing
15.2 DISTRIBUTED INDEXING: SEARCH TREES
15.2.3 Case Study: BIGTABLE
Adjustment of the Client Image
16 Distributed Computing with MAPREDUCEand PIG
16.1.2 The Programming Environment
16.1.3 MAPREDUCE Internals
16.2.4 Using MAPREDUCE to Optimize PIG Programs
17 Putting into Practice: Full-Text Indexing with LUCENE (by Nicolas Travers)
17.1 PRELIMINARY: A L UCENE SANDBOX
17.2 INDEXING PLAIN TEXT WITH L UCENE -- A FULL EXAMPLE
17.2.4 Searching the Index
17.2.5 LUCENE Querying Syntax
17.3 PUT IT INTO PRACTICE!
17.3.1 Indexing a Directory Content
17.3.2 Web Site Indexing (Project)
17.4 LUCENE - TUNING THE SCORING (PROJECT)
18 Putting into Practice: Recommendation Methodologies (by Alban Galland)
18.1 INTRODUCTION TO RECOMMENDATION SYSTEMS
18.4 GENERATING SOME RECOMMENDATIONS
18.4.1 Global Recommendation
18.4.2 User-Based Collaborative Filtering
18.4.3 Item-Based Collaborative Filtering
18.5.2 The Probabilistic Way
18.5.3 Improving Recommendation
19 Putting into Practice: Large-Scale Data Management with HADOOP
19.1 INSTALLING AND RUNNING HADOOP
19.2 RUNNING MAP REDUCE JOBS
19.4 RUNNING IN CLUSTER MODE (OPTIONAL)
19.4.1 Configuring HADOOP in Cluster Mode
19.4.2 Starting, Stopping, and Managing HADOOP
20 Putting into Practice: COUCHDB, a JSON Semistructured Database
20.1 INTRODUCTION TO THE COUCHDB DOCUMENT DATABASE
20.1.1 JSON, a Lightweight Semistructured Format
Complex Values: Objects and Arrays
20.1.2 COUCHDB, Architecture, and Principles
20.1.3 Preliminaries: Set Up Your COUCHDB Environment
20.1.7 Distribution Strategies: Master–Master, Master–Slave, and Shared–Nothing
Shared-Nothing architecture
20.2 PUTTING COUCHDB INTO PRACTICE!
20.2.2 Project: Build a Distributed Bibliographic Database with COUCHDB