Welcome to Debtech International

Onsite Seminar

Introduction to Big Data

Introduction to Big Data

Introduction to Big Data

Introduction to Big Data

Introduction to Big Data

NOSQL Data Modeling

NOSQL stands for Not Only SQL or even Not SQL. RDBMSs have currently shown poor performance on data-intensive applications, including: indexing a large number of documents, serving pages on high-traffic websites, handling the volumes of social networking data and delivering streaming media. Thereby, NOSQL databases address new kinds of problems and have different requirements. Many large-scale social networking and other applications are better suited to NOSQL. Nevertheless, data modeling has fundamental principles that are essential for all data models. Most NOSQL developers advocate abandonment of data modeling and its principles because NOSQL is allegedly schema-less. Experiences with XML and other so-called schema-less models have confirmed that data modeling is still critical. This course defines fundamental and immutable data modeling principles. It then shows how to apply these principles to different forms of NOSQL database and file management systems. It addresses the scalability issues inherent in a NOSQL solution, and how these affect data modeling. It explains the reality vs. the hype of NOSQL databases. It dispels prevalent misunderstandings of relational DBMSs.

If you want to understand how to model data in any NOSQL environment, then this is the course for you. It will discuss data modeling that is appropriate for parallel relational (such as Teradata), inverted column relational (such as Sybase IQ), ordered key column (such as Cassandra), key value (such a Amazon’s Dynamo), graph (such as Neo4j) and document (such as Mongo). As in everything, balance is important. This course teaches a balanced way to do data modeling in any of these new NOSQL environments without throwing out the baby with the bathwater.

What You Will Learn

  • What are NOSQL data stores
  • The steps and structures of data modeling for NOSQL
  • How to apply data modeling that is appropriate to NOSQL
  • How to optimize NOSQL data stores
  • Case study exercises will be used to provide the appropriate skills
  • deling to NOSQL

Course Outline

NOSQL Movement

  • Shortcomings of past solutions
  • Context of NOSQL
  • Definition of NOSQL
  • Characteristics of NOSQL
  • Client applications for NOSQL
  • Hadoop

NOSQL database data models

  • Parallel relational (such as Teradata, Vertica)
  • Inverted column relational (such as Sybase IQ)
  • Ordered key column (such as Apache HBase, Apache Cassandra)
  • Key value (such a Amazon’s Dynamo, Oracle Coherence, Redis, Kyoto Cabinet)
  • Graph (such as Neo4j, FlockDB)
  • Full Text Search Engines: Apache Lucene, Apache Solr
  • Comparison of major DBMS types
  • Document (such as MongoDB, CouchDB

Introduction to Data Modeling and Databases

Review of Data Modeling Principles and Patterns

NOSQL data modeling objects

  • Column families
  • Column keys (column name)
  • Column value
  • Supercolumns
  • Supercolumn key (supercolumn name)
  • Functional dependency
  • Cardinality
  • Relevant models
  • Class Words
  • Meaning and purpose
  • Extra significance in NOSQL (usually ignored)
  • Entity composition
  • Attribute composition
  • Data retrieval

Information Gathering Methods for NOSQL Modeling

  • NOSQL as a vertical and hierarchical slice of data
  • The information model (as distinct from data model)
    • Definition
    • Understanding business and reporting rules
    • Defining queries
    • Analyzing queries
    • Organizing queries into NOSQL structures
    • The effect of query analysis on the data model

Advanced Techniques in NOSQL Data Modeling

  • Data Structure Optimization
    • Factors influencing optimization
    • Importance of understanding the data usage
    • Three ways to optimize data
  • Denormalization strategies
    • Application-applied joins and ETL joins
    • Indexing
    • Index table
    • Composite key index
    • Composite key aggregation
    • Inverted indexes
  • Hierarchy modeling
    • Tree aggregation
    • Self-referencing structures
    • Nested sets with pros and cons
    • Adjacency lists with pros and cons
    • Descendent/ancestor structures
    • Materialized views
  • Specialized modeling
    • Flattening nested structures
    • Proximity queries
    • Graph processing
  • Misconceptions about Relational DBMSs
    • Debunking denormalization
    • Relational rules
  • Referential integrity
  • ACID principles
    • Atomic, consistent, isolated, durable
    • Immediate integrity
    • Enforcing immediate integrity
  • CAP principles
    • Consistency, availability, partition tolerant
    • Pick two
    • Eventual consistency
  • Using views
    • To ensure integrity
    • To simplify a complex data model
    • To insulate data
    • For efficient SQL

Who Should Attend

Experienced data modeling professionals interested in learning the application of data modeling to the NOSQL environment.

NOSQL professionals needing to know how to apply data modeling.

3 days

Course Format
Lecture, group discussion and exercises

Tom Haughey

To request a quote for this in-house seminar
Please call (561) 218-4752 or email info@debtechint.com

Return to Onsite Seminars Table of Contents