Relational database

Lecture 1

Relational database and database management system

Abstract

In lecture 1 we explain the fundamental concepts of databases. In particular we present the relational data model. Modern commercial databases are based on this model. We have also enclosed some information on the client-server architecture which is used in database applications. At the end of the lecture you will find a brief summary of the history of databases.

None of the exercises for this lecture require using any software.

Database

Here are some general facts that drive the whole domain of databases.

The data is an asset of the company just like the others (employees, machines, stock). The data must be managed just like the other assets.
The information is the data with semantics (defined by the meaning of the data in the company).
The management of data is realized by the information system that caters for the need for information inside the company.
The database has become the standard means to structure the process of data management.
The database is a part of the information system of the company.

Why is this lecture important and interesting?

Information is ubiquitous:
- from simple services in web browsers
- to scientific applications.
There are more and more collections of data varying in the degree of complexity and size.
- electronic libraries, multimedia databases, interactive video, the exploration of the human genome, space exploration (e.g. NASA).
The development of a database requires expertise in many areas of information technology:
- operating systems, information theory, artificial intelligence, programming languages, multimedia, software engineering.
Database systems constitute one of the largest and most active parts of the market.

Information system

Some aspects of an information system highlight its implementation nature:

The database application is considered, when we want to pay closer attention to the software and to emphasize the remarkable role of the database.
The computer system is considered, when we want to pay closer attention to all computer-related artifacts, i.e. both hardware and software.

Relational data model

Relation data model was invented by Edgar Codd and presented for the first time in a paper in 1970. From the mathematical point o view a database is a set of relations. This is why we talk about the relational data model and relational databases. Mathematicians define a relation as a subset of the Cartesian product of some sets. A relation may be represented as a two-dimensional table that consists of rows and columns. We assume that:

The number of columns is fixed.
Each column is assigned a name and a domain.
There is exactly one atomic value at the intersection of a row and a column. This value belongs to the domain of this column.
A row represents a single chunk of information, e.g. a person.
We take no account of the order of rows (records) and columns (fields of records).

Table Teachers

TeacherId	FirstName	LastName	Title
2137	Lech	Banachowski	Prof.
3245	Krzysztof	Stencel	Dr.
8976	John	Smith	Mr.

Table Lectures

LectureName	Code	TeacherId
Databases	DBA	2137
Information systems	INS	3245
Internet technologies	INT	3245
Object-oriented programming	OOP	8976
Expert systems	EXS	2137

Column TeacherId of table Lectures is specific, because:

Its values do not describe properties of lectures.
It represents the relationship between the lecture and a teacher, whose data is stored in a separate table. In order to retrieve this data we have to use the identifier and find appropriate row in the related table.
Therefore, it is important that this identifier uniquely determines the teacher. In the relational data model there is no other means of setting the identity of a row. You can do it by means of the stored values only.

Primary key and unique keys

In every table there must be a unique identifier. We call it the primary key It is a subset (may be a singleton) of the columns of the table such that the values in these columns uniquely identify the row.
A unique key (or alternative key or simply key) has the same property as the primary key. However, there is only one primary key, whereas the number of unique keys may be grater than one.

The primary key of the table Lectures is the column Code. The column LectureName is a unique key.

The primary key of the table Teacher is the column TeacherId. The column SecondName need not be a key.

Please answer the following simple question:

How many primary keys may a table have?

Foreign key

A foreign key is the set of one or more columns whose values are also the values of the primary key or a unique key of the related table (it can be the same table). The values of these columns are interpreted as pointers to the rows of the related table.

Column TeacherId of table Lectures is a foreign key that references column TeacherId of table Teachers.

For example number 2137 in the row of table Lectures that describes lecture "Databases" is a reference to the row of table Teachers that stores information on the teacher named "Banachowski". This relationship reads:

Lecture "Databases" is taught by Lech Banachowski.

Please answer the following simple question:

How many foreign keys may a table have?

Null = lack of value

Domains of all columns are extended by the special value called Null. The meaning of Null is the lack of values. It could be:

temporary or
permanent (Null is something else than the empty string or zero).

All comparisons with Null yield Null as the result (expression Null=Null also evaluates Null). It concerns almost all operations that have Null as the argument - their value is Null.

Therefore, Null is the third logical value besides True and False. Here are the truth tables for three major logical connectives:

Disjunction (`OR`)

OR	True	False	Null
True	True	True	True
False	True	False	Null
Null	True	Null	Null

Conjunction (`AND`)

AND	True	False	Null
True	True	False	Null
False	False	False	False
Null	Null	False	Null

Negation (`NOT`)

NOT	True	False	Null
	False	True	Null

It is time for a simple question:

Is it possible that a logical expression is true, when one if its arguments is Null?

Predicates `Is Null` and `Is Not Null`

They allow recognizing null values.

(X Is Null) evaluates to True, when X is Null
(X Is Null) evaluates to False, when X is not Null

However, be careful, because

(X = Null) always evaluates to Null (even for X being Null)

Here is another simple question:

What is the logical value of expression (Null=Null) Is Null?

Integrity constraints

Integrity constraints are conditions that must be fulfilled by data in a database, e.g.

a constraint for a single column:
```
0 < Age < 140
```
a constraint for several columns:
```
birthDate < hireDate
```
the uniqueness (the primary keys, unique keys),
NOT NULL constraint,
the referential integrity: each value of a foreign key is either Null or occurs also in the appropriate column of the associated primary (or unique) key.

a more complex business rule that can be programmed in a more sophisticated language, e.g.

The sum of the salaries of all employees of a department
equals
the wage fund of this department.

Again a simple question:

Which popular data model does not enforce referential integrity?

View

A view is a virtual table to be used by the users, e.g.

Lectures-Teachers

LectureName	Teacher
Databases	Banachowski
Information systems	Stencel
Internet technologies	Stencel
Object-oriented programming	Smith
Expert systems	Banachowski

The content of a usual view is calculated from the base tables at each usage. It is not stored in the database. However, sometimes we need to save this content in the database. Then we use this frozen image and do not recompute it again and again. Such view is called materialized.

Index

An index is an additional data structure that accelerates access to the data in a table, when you search rows with particular values in a set of columns. For example, the index built for column LastName accelerates the search for the teacher with a given last name. The purpose of an index in a database is similar to that of the index in a book.

Levels of a relational database

The same data of a database may be viewed and processed in many different ways. The point of view depends on the level of abstraction. The basic levels of abstraction are:

The user level consists of the views for users.
The logical (conceptual) level consists of views, tables and indexes.
The physical level consists of files with data and indexes.

Views are defined on the logical level and used on the user level. Indexes are defined on the logical level and used on the physical level. Tables are defined on the logical level and used on the physical level and the user level.

The user level is intended mainly for the end users. The logical level is used by the system data administrator, while the database administrator (shortly dba) works with the physical level. Of course, the designer of the database defines and works with all three levels.

Each level is somewhat independent. For example, you can change the location of the data on the disk as well as the storage parameters without any impact on the logical structure of the tables. Such a possibility is called the physical data independence. You can also change tables without any change of the applications, provided these are based on views and not directly tables. This feature is called the logical data independence.

Consider our sample database. Its application can consist in displaying the information on who delivers lectures, i.e. displaying the content of a view Lectures-Teachers. In the future it may turn out that some lectures are given by more than one teacher. The schema of the database with two tables Teachers and Lectures is no longer sufficient. A new associative table is required. It will store the relationship between teachers and lectures. The definition of view Lectures-Teachers must be changed as well. All these changes apply to the logical level of the database. The application itself is based on this view and thus it need not be changed. It is an example of the meaning of the logical data independence.

The physical data independence becomes important, when the disk is already full and we have to buy a new one. New rows will be stored on the new disk, but neither the schemata of the tables nor the applications have to be changed.

Catalog

The catalog (or data dictionary or metadata) is the collection of tables and views that describe the schema of the database. In other words, the catalog contains definitions of all the objects of all the three levels of the database. The name and the data type of a column are examples of metadata.

It is important to use the relational data model for the catalog. This ensures that the metadata are stored and processed the same way as the data.

Client-server architecture

Database applications usually consists of at least two parts:

the client side: that operates on the user's workstation,
the server side: that operates on the computer that runs the database server, i.e. the database itself and the database management system (DBMS).

The functions performed at the server side are:

storing data and facilitating access to it,
executing statements of the database language (usually it is SQL that will be presented during lecture 9),
enforcing data integrity,
managing the resources of the database, e.g. the users' accounts.

The functions performed at the client side are:

interacting with the user (running the user interface),
explaining the state of the computation to the user (it includes also errors and exceptional events),
accepting users' queries and then executing them or sending them to the server as SQL statements.

History of databases

1951

Univac presented the magnetic tape as a means to store data. (Before that punched cards were used).

1956

IBM introduced the magnetic hard drive.

1961

Integrated Data Store IDS (Charles Bachman, General Electric) was the first DBMS. It was the rise of the network data model.

1965-70

Information Management System IMS (IBM) - the hierarchical data model.

1970

Edgar Codd (1924-2003), IBM –, the relational data model.

1971

CODASYL, the standard for the network data model.

1976

Peter Chen - the entity-relationship model (ERD, ERM). There is no standard for this model so far (in 2004).

The beginning of the seventies

People at IBM research lab in San Jose develop languge Sequel. It was the prototype of SQL. You can still meet "old" people who pronounce SQL as "Sequel". Now you know why they do it.

1973

The first relational database management system (System R developed by IBM).

1979

Relational Software (later rebranded to Oracle) marketed the first commercial version of a relational database management system.

1987

The first standard of SQL (ISO).

Next versions of standard SQL (by ANSI/ISO): 1989, 1992 SQL2, 1999 SQL3 (the object-relational model).
SQL4 is developed.

The eighties

Reserch on deductive and object-oriented databases.

1997

The standard for object-oriented databases: ODMG 2.0.

Since the nineties till now

Databases have been extended by new aspects like multi-tier architectures, distribution, integration, parallel computation, Internet, data warehouses, OLAP, multimedia, databases of documents (also XML), GIS (Geographical Information Systems), ERP (Enterprise Resource Planning), MRP (Management Resource Planning) - packets like SAP, Baan, Oracle, PeopleSoft, Siebel, CRM (Customer Relationship Management).

Summary

During this lecture we explained the fundamental notions that apply to databases. We also emphasized the importance of databases for information systems. In particular, we presented the primary data model of modern databases. We call it the relational data model. We described the following concepts of this model: table, key, primary key, foreign key, view, pseudo-value Null and integrity constraints.

Dictionary

foreign key: A set of one or more columns whose values are also values of the primary key or a unique key of the related table (it can be the same table). The values of these columns are interpreted as pointers to the rows of the related table.
integrity constraint: A correctness condition for the data in a database.
key: A subset (may be a singleton) of the columns of a table such that the values in these columns uniquely identify the row of this table.
Null: The pseudo-value which means that the data is missing.
primary key: The distinguished key which is used to identify objects.
referential integrity constraint: An integrity constraint which states that each value of a foreign key is either Null or occurs also in the apropriate column of the associated primary (or unique) key.
table: A two-dimensional structure which consists of rows and columns. At the intersection of a row and a column there is only one atomic data item. A row stores a record of data on an object (e.g. a person or a company) or a relationship between objects. Each column contains a set of atomic data items which describe one of the attributes of an object (e.g. the name of the company or the last name of a person).
unique key: A key that is not primary.
view: A virtual table created for users. It is defined on the logical level and used on the user level. If it is physically saved in the form of a relational table, we call it a materialized view.

Exercises

Give an example of two tables connected with the relationship primary key-foreign key. Define their primary keys. Do they have unique (alternative) keys? Define at least one foreign key. Draw both tables on paper and enter some rows into them. Remember not to break the referential integrity.
Reflect on the meaning of data and information in your life. Do you use any systems to store your data? Can you call these systems databases?

Page prepared in Polish by Lech Banachowski and translated into English by Krzysztof Stencel.

Lecture 1