Database normalization
Database normalization is a technique to organize the contents
of databases. Without normalization, database systems can be inaccurate, slow,
and inefficient. The community of database professionals developed a series of
guidelines for the normalization of databases. Each 'level' of normalization is
referred to as a form, and there are 5 forms, total. First normal form is the
lowest level of normalization, up to fifth normal form, which is the highest
level of normalization.
- First Normal Form (1NF): The most basic level of data normalization, first normal form requires the elimination of all duplicate columns in a table, and also requires the creation of separate tables for related data, and identification of each table with a primary key attribute.
- Second Normal Form (2NF): Meets all the
requirements of first normal form, and creates relationships between
tables using foreign keys.
- Third Normal Form (3NF): Meets all the
requirements of second and first normal forms, and removes all columns
that are not dependent upon the primary key. Third normal form also
removes all derived attributes, such as age.
- Fourth Normal Form (4NF): Fourth normal
form adds one additional requirement, which is the removal of any
multi-valued dependencies in relationships.
- Fifth Normal Form (5NF): Fifth normal
form is a rarer form of normalization, in which case join dependencies are
implied by candidate keys (possibly primary key values).
In the reality of database development, getting to 3NF is the
most important jump. 4NF and 5NF are a bit more of a luxury (and sometimes a
nuisance) in database development, and are rarely seen in practice. If you're
struggling with the concepts, or remembering the first three forms, there is a
simple relation. "The key, the whole key, and nothing but the key.",
which relates to 1NF, 2NF, and 3NF.
The Benefits of Normalization
Now, without venturing too far into database theory, let's
simply focus on the benefits of normalization. As the data progresses through
the normalization forms, it becomes cleaner, better organized, and faster. Now,
with a small database that has only 5 tables and 100 rows of data, this won't
be readily apparent. However, as the database grows, the effects of
normalization will become much more apparent with regards to speed and maintaining
data integrity. However, there are some situations in which normalization
doesn't make sense, such as when normalizing the data will create excessively
complex queries required to return the data.