Database Overview


What is a database?

For what purpose?

Typical application

Features of a database system

  • Control data redundancy and inconsistency
    • Unnecessary duplication of data can cause problems
      • Space wastage: putting a student's postal address with every occurence of student's name
      • Update cost: if an address changes, we have to find every occurence of old address and update it
    • Some duplication is necessary.
    • Data must be consistent at all times: what happens if update crashes before completion?

  • Efficient data access
    • Don't scan whole dbase to answer a query.
    • For large amounts of data, searching can be time-consuming
    • Access efficiency measured differently in DBMS:
      => Goal: minimize disk access time.

  • Concurrent access
    • Example: Registrar's office has many employees that need to access database simultaneously
    • Need to coordinate actions of different users carefully
      • User 1: Delete "Smith took CS177"
      • User 2: "List Smith's classes"

  • Security
    • Not every user should be able to access all the data (students able to add/delete grades)
    • Control of access to data is needed without having to duplicate data

  • Integrity constraints
    • Grades need to be in the set {A+,...,F}
    • GPA's cannot be negative
    • A student cannot take a non existent course
    • DBMS should enforce integrity constraints

  • Backup and recovery
    • Data must be systematically backed up
    • Recovery in system crashes
    • Inserts and updates cannot occur partially (either complete it or don't do it at all)

  • Metadata
    • Data about the data (data types, interrelationships, integrity constraints etc).
    • Directory and catalog information.

  • Data and program independence
    • Data should be stored in a program independent fashion
      • Changes in program should not affect access to data
      • Data reentry should not be needed

  • User and programmer interface
    • Query language
    • GUI
    • Programmer API
    • Features that allow data to be re­organized for efficient access

What is a data model?

  • Data Model: An agreed upon method for abstractly describing the logical organization of data

  • Analogy: architectural blueprints
    • Blueprints are standardized: any builder can work on a plan drawn by any architect.
      => conventional symbols for windows doors plumbing etc.
    • However, blueprint standards may differ across countries.

  • A Data Model is like a blueprint standard:
    • An agreed upon way to describe how a dbase is to be organized.
    • Permits actual implementation to be independent of this description.
      => blueprint does not specify building materials (cement or brick).
    • A data model can be implemented in various ways.

  • Example: sets
    • Sets are described using notation like
      A = {17, 234, 88}
      B = {45, 88, 129, 564}
    • Operations are defined on sets (union, intersection etc).
    • However, sets can be implemented in many ways:
      1. Arrays
      2. Linked lists
      3. Various algorithms for union, intersection etc

  • Summary: A Data Model describes the logical organization of data along with operations that manipulate the data

Databases: Then and Now

  • Ancient data storage: stone tablets, scrolls, paper, log books etc

  • Early days of computing: storage on cards, paper tape

  • 1950's:
    • First commercial uses of computers
    • Census database
    • Customer database for IBM

  • 1960's:
    • IBM and others develop first few DBMS
    • Based on files, hierarchical and network data models
    • Nascent formalization of databases

  • 1970's:
    • Relational database model (E.F.Codd)
    • Significant research in sorting, searching, physical implementation
    • Query languages: SQL, QUEL, QBE
    • Several popular databases: System R (IBM),Ingres (Berkeley)
    • Transaction processing

  • 1980's:
    • Databases for PC's
    • Many additional features (GUIs, additional power)
    • Research on distributed databases, OO databases
    • Today's database giants: Oracle, Sybase, Informix
    • Preliminary research on non-traditional databases:
      => geographic and spatial systems, image databases, scientific databases

  • 1990's:
    • Geographic Information Systems: GIS, spatial databases
      => maps, terrain data
    • Image databases
      => Images, video
    • Web interfaces
    • Financial, accounting systems
    • Scientific databases, bioinformatics

People associated with DBMS

  • Database administrator
  • Database application programmer
  • End user
  • DBMS system programmer
  • Database researcher

Relation Data Model

  • Recall: a data model is a convention for describing the logical organization of data
  • Other data models: hierarchical and network models
  • Both the hierarchical and network models need a detailed understanding of the structure of the data to answer queries
  • Goals of relational model
    • To describe a logical structure simple enough to minimize structural or navigational information.
    • To create a mathematically precise framework for the logical representation of data.

Definitions

  • Consider this airline database example:

    • The data in the DBMS is a collection of tables.
    • Each table has named columns and data in the rows.
    • Another word for table: relation.

  • Other names for a row: a record or a tuple.
    Example: "Alice, 555-55-5555, F101"

  • Other names for columns: fields, attributes.

What is a key?

  • A key is a group of columns in a table such that: each row has a unique value in those columns
    Example: SSN in the PASSENGER table.

  • Keys are useful in searching and removing duplication.

  • Primary key: opt for one key among several possible keys.

  • Foreign key: a column in one table that is a primary key of another table.
    Example: FLIGHT# in PASSENGER table.

Constraints

  • Domain constraints: proper typing of values (remember the example where GPA cannot be negative).
  • Key constraint 1: Every relation (or table) should have a key
  • Key constraint 2: Primary keys can't have null (empty) values
  • Foreign key constraint: A table shouldn't have a foreign key value that doesn't exist in the foreign relation

Operations on tables

  • Update
    • Insert a record in the table
      Example: Insert "Mary, 999-99-9999, F123" in the first table
    • Delete a record in the first table
      Example: Delete all rows with "NAME=John"
    • Modify an existing record in the table
      Example: Modify "Alice, 555-55-5555, F313" only changes the flight number field for Alice's record

  • Retrieval
    • Selection operator
      • Applies to a single table and results in a new table
      • Need to specify row-selection criteria
      • Results in a new table
      • Can specify condition as a boolean expression
        Example: "Select all records from first table where NAME=Bob" returns

  • Projection operator
    • Applies to a single table
    • Need to specify attributes list
    • Results in a new table containing only the attributes specified
      Example: "Project NAME and SSN from PASSENGER table"

  • Union operator
    • Applies to two tables that are union-compatible
    • Results in a new table containing rows from both tables (with duplicates removed)

  • Intersection and difference operators
    • Applies to two tables that are union-compatible
    • Intersection: rows common to both tables
    • Difference: rows in first table that are not in the second

  • Join operator
    • Applies to two tables
    • Resulting table contains rows that are "joined" from one row each of the two tables.
    • Two rows are "joined" based on the join condition.
      Example: Join PASSENGER and FLIGHT using FLIGHT#

Exercise 1: What is the result of the query "List the names of all the customers arriving at LAX airport". Show how to combine join and projection to get the result.


SQL: A Language for Relational Databases

  • Formal query languagues: Relational algebra and relational calculus
  • Real world query languages: SQL, QUEL, QBE (among others)
  • Most commonly used: SQL (pronounced either "see-kwell" or "ess-cue-ell"
  • SQL: originally called SEQUEL (Structured English QUEry Language), 1974-1976, IBM
  • ANSI SQL standard, 1986
  • SQL2 standard, 1992 - also called SQL-92 (current work on SQL3)
  • SQL:
    • Lets you create and delete tables
    • Lets you specify queries on existing tables
    • Lets you specify domain, key and foreign constraints
    • Has support for security, transaction management and remote access

Basic SQL Query

  • What is an SQL "program"?
    • SQL code can be typed interactively into an interpreter
    • SQL code can reside in text files and be compiled (s most high level languages are)
    • SQL statements can be input from within other programming languages

  • Basic form of an SQL query statement:
    select <attribute list>
    from <relation list>
    [where <condition>];

  • NOTE:
    • The SQL keywords above are: select, from, where
    • select and from clauses are required
      => where clause is optional
    • There are other optional clauses (such as group by)

    • Example:
      select NAME, SSN
      from PASSENGER
      where PASSENGER.FLIGHT# = 'F313';
      • This expresses the query: "Find the names and SSN's of all passengers whose flight number is F313"
      • The result:

    • The select in SQL corresponds to the project operator
    • The where clause corresponds to the select operator
    • The from clause specifies the tables which the query applies to.

  • Another example:
    • Query: "List the names and SSN's of all the passengers flying in flight F123"
    • This is how a join is done in SQL:
      select NAME, SSN
      from PASSENGER, FLIGHT
      where PASSENGER.FLIGHT# = FLIGHT.FLIGHT#
      and PASSENGER.FLIGHT# = 'F123';

  • How to "read" an SQL statement:
    • Think of computing the cross-product of the relations in the from clause
    • Then, apply the condition in the where clause to each tuple in the cross-product to select tuples
    • Finally, project the attributes in the select clause to get the result relation
    • Observe: the join condition is explicitly specified in the where clause
    • NOTE: in an actual database, query optimization will try to prevent expensive cross product computations

Bioinformatics Computing

The narrow view of bioinformatics:

  • Sequence comparisons (alignment) and searches.
  • Phylogenetics

Computing skills required for the narrow view:

  • Basic skills: programming, data structures, algorithms
  • Probability (Basic probability, Markov chains).
  • Discrete Optimization: combinatorial problems, dynamic programming, string searching
  • Specialized algorithms: clustering, discrimination.
The broader view of bioinformatics:
  • Sequence comparisons (alignment) and searches.
  • Phylogenetics
  • Protein folding computations
  • Graphics and visualization
  • Drug design
  • Chemical pathway process simulation

Additional computing skills for the broader view:

  • Systems: databases, distributed computing, parallel computing
  • Graphics/User Interfaces
  • Scientific computing