Lecture 1

Relational database and database management system


Abstract

In lecture 1 we explain the fundamental concepts of databases. In particular we present the relational data model. Modern commercial databases are based on this model. We have also enclosed some information on the client-server architecture which is used in database applications. At the end of the lecture you will find a brief summary of the history of databases.

None of the exercises for this lecture require using any software.


Database

Here are some general facts that drive the whole domain of databases.

  1. The data is an asset of the company just like the others (employees, machines, stock). The data must be managed just like the other assets.
  2. The information is the data with semantics (defined by the meaning of the data in the company).
  3. The management of data is realized by the information system that caters for the need for information inside the company.
  4. The database has become the standard means to structure the process of data management.
  5. The database is a part of the information system of the company.

Why is this lecture important and interesting?

Information system

Some aspects of an information system highlight its implementation nature:


Relational data model

Relation data model was invented by Edgar Codd and presented for the first time in a paper in 1970. From the mathematical point o view a database is a set of relations. This is why we talk about the relational data model and relational databases. Mathematicians define a relation as a subset of the Cartesian product of some sets. A relation may be represented as a two-dimensional table that consists of rows and columns. We assume that:

  1. The number of columns is fixed.
  2. Each column is assigned a name and a domain.
  3. There is exactly one atomic value at the intersection of a row and a column. This value belongs to the domain of this column.
  4. A row represents a single chunk of information, e.g. a person.
  5. We take no account of the order of rows (records) and columns (fields of records).

Table Teachers

TeacherId FirstName LastName Title
2137 Lech Banachowski Prof.
3245 Krzysztof Stencel Dr.
8976 John Smith Mr.

Table Lectures

LectureName Code TeacherId
Databases DBA 2137
Information systems INS 3245
Internet technologies INT 3245
Object-oriented programming OOP 8976
Expert systems EXS 2137

Column TeacherId of table Lectures is specific, because:

  1. Its values do not describe properties of lectures.
  2. It represents the relationship between the lecture and a teacher, whose data is stored in a separate table. In order to retrieve this data we have to use the identifier and find appropriate row in the related table.
  3. Therefore, it is important that this identifier uniquely determines the teacher. In the relational data model there is no other means of setting the identity of a row. You can do it by means of the stored values only.


Primary key and unique keys

  1. In every table there must be a unique identifier. We call it the primary key It is a subset (may be a singleton) of the columns of the table such that the values in these columns uniquely identify the row.
  2. A unique key (or alternative key or simply key) has the same property as the primary key. However, there is only one primary key, whereas the number of unique keys may be grater than one.

The primary key of the table Lectures is the column Code. The column LectureName is a unique key.

The primary key of the table Teacher is the column TeacherId. The column SecondName need not be a key.

Please answer the following simple question:

How many primary keys may a table have?

Foreign key

A foreign key is the set of one or more columns whose values are also the values of the primary key or a unique key of the related table (it can be the same table). The values of these columns are interpreted as pointers to the rows of the related table.

Column TeacherId of table Lectures is a foreign key that references column TeacherId of table Teachers.

For example number 2137 in the row of table Lectures that describes lecture "Databases" is a reference to the row of table Teachers that stores information on the teacher named "Banachowski". This relationship reads:

Lecture "Databases" is taught by Lech Banachowski.

Please answer the following simple question:

How many foreign keys may a table have?

Null = lack of value

Domains of all columns are extended by the special value called Null. The meaning of Null is the lack of values. It could be:

All comparisons with Null yield Null as the result (expression Null=Null also evaluates Null). It concerns almost all operations that have Null as the argument - their value is Null.

Therefore, Null is the third logical value besides True and False. Here are the truth tables for three major logical connectives:

Disjunction (OR)

OR      True    False     Null    
True True True True
False True False Null
Null True Null Null

Conjunction (AND)

AND      True    False     Null    
True True False Null
False False False False
Null Null False Null

Negation (NOT)

NOT  True  False  Null 
  False True Null

It is time for a simple question:

Is it possible that a logical expression is true, when one if its arguments is Null?

Predicates Is Null and Is Not Null

They allow recognizing null values.

(X Is Null) evaluates to True, when X is Null
(X Is Null) evaluates to False, when X is not Null
However, be careful, because
(X = Null) always evaluates to Null (even for X being Null)
Here is another simple question:

What is the logical value of expression (Null=Null) Is Null?

Integrity constraints

Integrity constraints are conditions that must be fulfilled by data in a database, e.g.

Again a simple question:

Which popular data model does not enforce referential integrity?

View

A view is a virtual table to be used by the users, e.g.

Lectures-Teachers

LectureName Teacher
Databases Banachowski
Information systems Stencel
Internet technologies Stencel
Object-oriented programming Smith
Expert systems Banachowski

The content of a usual view is calculated from the base tables at each usage. It is not stored in the database. However, sometimes we need to save this content in the database. Then we use this frozen image and do not recompute it again and again. Such view is called materialized.


Index

An index is an additional data structure that accelerates access to the data in a table, when you search rows with particular values in a set of columns. For example, the index built for column LastName accelerates the search for the teacher with a given last name. The purpose of an index in a database is similar to that of the index in a book.


Levels of a relational database

The same data of a database may be viewed and processed in many different ways. The point of view depends on the level of abstraction. The basic levels of abstraction are:

  1. The user level consists of the views for users.
  2. The logical (conceptual) level consists of views, tables and indexes.
  3. The physical level consists of files with data and indexes.
Views are defined on the logical level and used on the user level. Indexes are defined on the logical level and used on the physical level. Tables are defined on the logical level and used on the physical level and the user level.

The user level is intended mainly for the end users. The logical level is used by the system data administrator, while the database administrator (shortly dba) works with the physical level. Of course, the designer of the database defines and works with all three levels.

Each level is somewhat independent. For example, you can change the location of the data on the disk as well as the storage parameters without any impact on the logical structure of the tables. Such a possibility is called the physical data independence. You can also change tables without any change of the applications, provided these are based on views and not directly tables. This feature is called the logical data independence.

Consider our sample database. Its application can consist in displaying the information on who delivers lectures, i.e. displaying the content of a view Lectures-Teachers. In the future it may turn out that some lectures are given by more than one teacher. The schema of the database with two tables Teachers and Lectures is no longer sufficient. A new associative table is required. It will store the relationship between teachers and lectures. The definition of view Lectures-Teachers must be changed as well. All these changes apply to the logical level of the database. The application itself is based on this view and thus it need not be changed. It is an example of the meaning of the logical data independence.

The physical data independence becomes important, when the disk is already full and we have to buy a new one. New rows will be stored on the new disk, but neither the schemata of the tables nor the applications have to be changed.


Catalog

The catalog (or data dictionary or metadata) is the collection of tables and views that describe the schema of the database. In other words, the catalog contains definitions of all the objects of all the three levels of the database. The name and the data type of a column are examples of metadata.

It is important to use the relational data model for the catalog. This ensures that the metadata are stored and processed the same way as the data.


Client-server architecture

Database applications usually consists of at least two parts:

the client side
that operates on the user's workstation,
the server side
that operates on the computer that runs the database server, i.e. the database itself and the database management system (DBMS).
The functions performed at the server side are:
  1. storing data and facilitating access to it,
  2. executing statements of the database language (usually it is SQL that will be presented during lecture 9),
  3. enforcing data integrity,
  4. managing the resources of the database, e.g. the users' accounts.
The functions performed at the client side are:
  1. interacting with the user (running the user interface),
  2. explaining the state of the computation to the user (it includes also errors and exceptional events),
  3. accepting users' queries and then executing them or sending them to the server as SQL statements.

History of databases

1951
Univac presented the magnetic tape as a means to store data. (Before that punched cards were used).

1956
IBM introduced the magnetic hard drive.
1961
Integrated Data Store IDS (Charles Bachman, General Electric) was the first DBMS. It was the rise of the network data model.
1965-70
Information Management System IMS (IBM) - the hierarchical data model.
1970
Edgar Codd (1924-2003), IBM –, the relational data model.

1971
CODASYL, the standard for the network data model.
1976
Peter Chen - the entity-relationship model (ERD, ERM). There is no standard for this model so far (in 2004).
The beginning of the seventies
People at IBM research lab in San Jose develop languge Sequel. It was the prototype of SQL. You can still meet "old" people who pronounce SQL as "Sequel". Now you know why they do it.
1973
The first relational database management system (System R developed by IBM).
1979
Relational Software (later rebranded to Oracle) marketed the first commercial version of a relational database management system.
1987
The first standard of SQL (ISO).
The eighties
Reserch on deductive and object-oriented databases.
1997
The standard for object-oriented databases: ODMG 2.0.
Since the nineties till now
Databases have been extended by new aspects like multi-tier architectures, distribution, integration, parallel computation, Internet, data warehouses, OLAP, multimedia, databases of documents (also XML), GIS (Geographical Information Systems), ERP (Enterprise Resource Planning), MRP (Management Resource Planning) - packets like SAP, Baan, Oracle, PeopleSoft, Siebel, CRM (Customer Relationship Management).

Summary

During this lecture we explained the fundamental notions that apply to databases. We also emphasized the importance of databases for information systems. In particular, we presented the primary data model of modern databases. We call it the relational data model. We described the following concepts of this model: table, key, primary key, foreign key, view, pseudo-value Null and integrity constraints.


Dictionary

foreign key
A set of one or more columns whose values are also values of the primary key or a unique key of the related table (it can be the same table). The values of these columns are interpreted as pointers to the rows of the related table.
integrity constraint
A correctness condition for the data in a database.
key
A subset (may be a singleton) of the columns of a table such that the values in these columns uniquely identify the row of this table.
Null
The pseudo-value which means that the data is missing.
primary key
The distinguished key which is used to identify objects.
referential integrity constraint
An integrity constraint which states that each value of a foreign key is either Null or occurs also in the apropriate column of the associated primary (or unique) key.
table
A two-dimensional structure which consists of rows and columns. At the intersection of a row and a column there is only one atomic data item. A row stores a record of data on an object (e.g. a person or a company) or a relationship between objects. Each column contains a set of atomic data items which describe one of the attributes of an object (e.g. the name of the company or the last name of a person).
unique key
A key that is not primary.
view
A virtual table created for users. It is defined on the logical level and used on the user level. If it is physically saved in the form of a relational table, we call it a materialized view.

Exercises

  1. Give an example of two tables connected with the relationship primary key-foreign key. Define their primary keys. Do they have unique (alternative) keys? Define at least one foreign key. Draw both tables on paper and enter some rows into them. Remember not to break the referential integrity.
  2. Reflect on the meaning of data and information in your life. Do you use any systems to store your data? Can you call these systems databases?

Page prepared in Polish by Lech Banachowski and translated into English by Krzysztof Stencel.