Normalization


Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.

Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations. For example, if a StudentName field is in several tables, then a spelling correction to a student's name will have to be made in each of those tables. Alternatively, if a Student table contains the StudentID and the StudentName fields, then other tables needing access to student information can link to the Student table. Any necessary changes to the students' data need only occur in one place.

When a database is normalized, all of the tables describe or are concerned with a limited part of the overall schema. Normalizing a database accomplishes several things:
  • Arranges data into logical groupings or subschemas so that each group describes some small part of the whole story
  • Ensures that each column of each table clearly defines and describes its data content and that there is no ambiguity about the data that is stored
  • Minimizes the amount of duplicate data that could potentially be stored in a database
  • Organizes the data so that when modifications are necessary, the change can be made in only one place
  • Builds a database that functions quickly and efficiently without compromising the integrity of the data in storage
There are a few rules for database normalization. Each rule is called a "normal form." If the first rule is observed, the database is said to be in "first normal form." If the first three rules are observed, the database is considered to be in "third normal form." Although other levels of normalization are possible, third normal form is considered the highest level necessary for most applications.

First Normal Form

  • Identify each set of related data with a primary key


  • A primary key is a uniquely identifying field (or fields) that directly identifies each value of the remaining fields in a given record.

  • No multi-valued attributes (fields that contain more than one value in each row)


  • For example, if a student is in more than one course, this is the wrong way to store the data - there are several course numbers in the Course field:

    StudentIDCourse
    11111420,330,470
    22222230,120,150


  • No repeating groups in individual tables (fields that represent multiple items)


  • Do not use multiple fields in a single table to store similar data. For example, to track which students have which courses, a record may contain fields for Course1, Course2, and Course3:

    StudentIDCourse1Course2Course3
    11111420330470
    22222230120150


    This becomes a problem if you would need to add a fourth course. Adding yet another field will not solve the problem - that would require the developer to modify the program and table structure, and does not readily allow a dynamic number of courses.
Instead, place all course information in a separate table called Courses, then link Students to Courses with an ID number key. This table is now in First Normal Form:

StudentIDCourse
11111420
11111330
11111470
22222230
22222120
22222150


Second Normal Form

  • In First Normal Form
  • Separate tables for sets of values that apply to multiple records
Second Normal form applies only to tables with compound keys (containing more than one field). If the primary key is a compound key, then each record should be directly related to each of the components of the key.

For example, in our Student table, the StudentID/Course combination is the Primary Key. The Room field does not depend at all on the StudentID field. In this case, the Room field should be put into a separate table. To put a table in Second Normal Form, these related tables are linked with a foreign key.

StudentID Course Room
11111 420 221
11111 330 136
11111 470 201
22222 230 410
22222 120 313
22222 150 526

In Second Normal form, this table becomes two linked tables:

StudentID Course
11111 420
11111 330
11111 470
22222 230
22222 120
22222 150
Course Room
420 221
330 136
470 201
230 410
120 313
150 526

















Third Normal Form

  • In Second Normal Form
  • Eliminate fields that do not depend on the key
Values in a record that are not part of that record's key do not belong in the table. In general, any time the contents of a group of fields may apply to more than a single record in the table, consider placing those fields in a separate table.

Keep in mind that adhering to the third normal form, while theoretically desirable, is not always practical. In a typical Customers table, in order to eliminate all possible interfield dependencies, there must be separate tables for cities, ZIP codes, sales representatives, customer classes, and any other factor that may be duplicated in multiple records. In theory, normalization is worth pursing. However, many small tables may degrade performance or exceed open file and memory capacities.

It may be more feasible to apply third normal form only to data that changes frequently. If some dependent fields remain, design your application to require the user to verify all related fields when any one is changed.

Other Normalization Forms

Other normalizations forms, including Boyce Codd Normal Form, fourth normal form, and fifth normal form do exist, but are rarely used in production databases. Disregarding these rules may result in less than theoretically "perfect" database design, but should not affect functionality.

Well-normalized databases are complex to program against. Application programmers who deal with creating front-end applications usually have to use multi-table joins, subqueries, or views as data sources, since the necessary data is contained in several normalized tables. This can make the application run very slowly and/or make finding information difficult.

As with many formal rules and specifications, real world scenarios do not always allow for perfect compliance. In general, normalization requires additional tables and some customers find this cumbersome. If you decide to violate one of the first three rules of normalization, make sure that your application anticipates any problems that could occur, such as redundant data and inconsistent dependencies.
 
PrintPrint CloseClose

Copyright © 2001 - 2024 MJ Technologies, LLC.  All rights reserved.