What are the rules of first normal form
What are the rules of first normal form
First Normal Form (1NF)
If a table has data redundancy and is not properly normalized, then it will be difficult to handle and update the database, without facing data loss. It will also eat up extra memory space and Insertion, Update and Deletion Anomalies are very frequent if database is not normalized.
Normalization is the process of minimizing redundancy from a relation or set of relations. Redundancy in relation may cause insertion, deletion and update anomalies. So, it helps to minimize the redundancy in relations. Normal forms are used to eliminate or reduce redundancy in database tables.
There are various level of normalization. These are some of them:
In this article, we will discuss First Normal Form (1NF).
First Normal Form (1NF):
If a relation contains a composite or multi-valued attribute, it violates the first normal form, or the relation is in first normal form if it does not contain any composite or multi-valued attribute. A relation is in first normal form if every attribute in that relation is singled valued attribute.
A table is in 1 NF iff:
Consider the examples given below.
Example-1:
Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE. Its decomposition into 1NF has been shown in table 2.
Example-2:
In the above table, Course is a multi-valued attribute so it is not in 1NF.
Below Table is in 1NF as there is no multi-valued attribute:
Note: A database design is considered as bad if it is not even in the First Normal Form (1NF).
Description of the database normalization basics
Original KB number: В 283878
This article explains database normalization terminology for beginners. A basic understanding of this terminology is helpful when discussing the design of a relational database.
Description of normalization
Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.
Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations. A customer address change is much easier to implement if that data is stored only in the Customers table and nowhere else in the database.
What is an «inconsistent dependency»? While it is intuitive for a user to look in the Customers table for the address of a particular customer, it may not make sense to look there for the salary of the employee who calls on that customer. The employee’s salary is related to, or dependent on, the employee and thus should be moved to the Employees table. Inconsistent dependencies can make data difficult to access because the path to find the data may be missing or broken.
There are a few rules for database normalization. Each rule is called a «normal form.» If the first rule is observed, the database is said to be in «first normal form.» If the first three rules are observed, the database is considered to be in «third normal form.» Although other levels of normalization are possible, third normal form is considered the highest level necessary for most applications.
As with many formal rules and specifications, real world scenarios do not always allow for perfect compliance. In general, normalization requires additional tables and some customers find this cumbersome. If you decide to violate one of the first three rules of normalization, make sure that your application anticipates any problems that could occur, such as redundant data and inconsistent dependencies.
The following descriptions include examples.
First normal form
Do not use multiple fields in a single table to store similar data. For example, to track an inventory item that may come from two possible sources, an inventory record may contain fields for Vendor Code 1 and Vendor Code 2.
What happens when you add a third vendor? Adding a field is not the answer; it requires program and table modifications and does not smoothly accommodate a dynamic number of vendors. Instead, place all vendor information in a separate table called Vendors, then link inventory to vendors with an item number key, or vendors to inventory with a vendor code key.
Second normal form
Records should not depend on anything other than a table’s primary key (a compound key, if necessary). For example, consider a customer’s address in an accounting system. The address is needed by the Customers table, but also by the Orders, Shipping, Invoices, Accounts Receivable, and Collections tables. Instead of storing the customer’s address as a separate entry in each of these tables, store it in one place, either in the Customers table or in a separate Addresses table.
Third normal form
Values in a record that are not part of that record’s key do not belong in the table. In general, anytime the contents of a group of fields may apply to more than a single record in the table, consider placing those fields in a separate table.
For example, in an Employee Recruitment table, a candidate’s university name and address may be included. But you need a complete list of universities for group mailings. If university information is stored in the Candidates table, there is no way to list universities with no current candidates. Create a separate Universities table and link it to the Candidates table with a university code key.
EXCEPTION: Adhering to the third normal form, while theoretically desirable, is not always practical. If you have a Customers table and you want to eliminate all possible interfield dependencies, you must create separate tables for cities, ZIP codes, sales representatives, customer classes, and any other factor that may be duplicated in multiple records. In theory, normalization is worth pursing. However, many small tables may degrade performance or exceed open file and memory capacities.
It may be more feasible to apply third normal form only to data that changes frequently. If some dependent fields remain, design your application to require the user to verify all related fields when any one is changed.
Other normalization forms
Fourth normal form, also called Boyce Codd Normal Form (BCNF), and fifth normal form do exist, but are rarely considered in practical design. Disregarding these rules may result in less than perfect database design, but should not affect functionality.
Normalizing an example table
These steps demonstrate the process of normalizing a fictitious student table.
Student# | Advisor | Adv-Room | Class1 | Class2 | Class3 |
---|---|---|---|---|---|
1022 | Jones | 412 | 101-07 | 143-01 | 159-02 |
4123 | Smith | 216 | 101-07 | 143-01 | 179-04 |
First normal form: No repeating groups
Tables should have only two dimensions. Since one student has several classes, these classes should be listed in a separate table. Fields Class1, Class2, and Class3 in the above records are indications of design trouble.
Spreadsheets often use the third dimension, but tables should not. Another way to look at this problem is with a one-to-many relationship, do not put the one side and the many side in the same table. Instead, create another table in first normal form by eliminating the repeating group (Class#), as shown below:
Student# | Advisor | Adv-Room | Class# |
---|---|---|---|
1022 | Jones | 412 | 101-07 |
1022 | Jones | 412 | 143-01 |
1022 | Jones | 412 | 159-02 |
4123 | Smith | 216 | 101-07 |
4123 | Smith | 216 | 143-01 |
4123 | Smith | 216 | 179-04 |
Second normal form: Eliminate redundant data
Note the multiple Class# values for each Student# value in the above table. Class# is not functionally dependent on Student# (primary key), so this relationship is not in second normal form.
The following tables demonstrate second normal form:
Student# | Advisor | Adv-Room |
---|---|---|
1022 | Jones | 412 |
4123 | Smith | 216 |
Student# | Class# |
---|---|
1022 | 101-07 |
1022 | 143-01 |
1022 | 159-02 |
4123 | 101-07 |
4123 | 143-01 |
4123 | 179-04 |
Third normal form: Eliminate data not dependent on key
In the last example, Adv-Room (the advisor’s office number) is functionally dependent on the Advisor attribute. The solution is to move that attribute from the Students table to the Faculty table, as shown below:
First Normal Form
By Yashi Goyal
What is the First Normal Form?
Before understanding the First Normal Form, one must know what Normalization is and why it is done? Normalization, in general terms, is the technique of organizing the data into the database to reduce the insertion, deletion and updation anomaly and remove data redundancy. This process divides the larger tables into smaller ones and links them with each other through relationships of the primary and foreign keys. Duplicate and unnormalized data not only consumes extra memory but makes it difficult to manage the table while insertion, deletion, and updation of tables as the number of data increases. Therefore it is very important to normalize the tables before designing the database of any application.
First Normal Form, written as 1NF , sets the fundamental rules of data normalization and is the first form used while normalizing the data of tables. It sets certain basic principles of data normalization, which needs to be fulfilled by every table. Some of the principles are given below:
Hadoop, Data Science, Statistics & others
How does First Normal Form Works?
According to the main principles of 1NF mentioned above,
Table Employee (before 1NF)
As the Emp_address has so much data for address, for a single Employee, To be in 1NF, the above table can be decomposed into two below given tables:
Table1: Employee_details (After 1NF)
Emp_id | Emp_name | Emp_age |
101 | Raghu | 25 |
102 | Rakesh | 28 |
103 | Rahul | 45 |
Table2: Employee_address (After 1NF)
There should be atomic values for a column that is indivisible in 1NF. For example, there can be multiple Emp_projects that he/she has handled in an Employee table until now. In order to have a record of all the projects of that employee, there should be a separate record for each project of an employee having unique value instead of projects being separated by ‘, ‘
Table: Emp_projects (Before 1NF)
Emp_id | Emp_years_of_ex perience | Emp_dept | Emp_projects |
101 | 3 | IT | abc,jkl |
102 | 2 | IT | bcd |
103 | 5 | Accounts | Abc, cfg,xyz, hjk |
Table: Emp_projects (After 1NF)
Emp_id | Emp_years_of_ex perience | Emp_dept | Emp_projects |
101 | 3 | IT | abc |
101 | 3 | IT | jkl |
102 | 2 | IT | bcd |
103 | 5 | Accounts | Abc |
103 | 5 | Accounts | cfg |
103 | 5 | Accounts | xyz |
103 | 5 | Accounts | hjk |
Table: Emp_projects having multiple repeating values in the above example can be broken down further into two tables to reduce repetition:
There should not be repeating values present in the table. Repeating values consumes a lot of extra memory and makes the search and update slow and maintenance of the database difficult. For example, In the above table of Employee_Projects, there are a lot of unnecessary repeating values of Emp_id, Emp_years_of_experience, and Emp_dept, so a new table needs to be created for this in order to reduce the repetition of values.
Table1:
Emp_id | Emp_years_of_experienc e | Emp_dept |
101 | 3 | IT |
102 | 2 | IT |
103 | 5 | Accounts |
Table2:
Emp_id | Emp_projects |
101 | abc |
101 | jkl |
102 | bcd |
103 | Abc |
103 | cfg |
103 | xyz |
103 | hjk |
Advantages of First Normal Form
Below given are some of the advantages of the First Normal Form (1NF):
Conclusion
While working with the databases and creating tables for any application in the starting, it is very important to normalize all the tables as it helps to eliminate insertion, deletion, and update anomalies. Normalization also removes future costs and time. Fewer null values and lesser redundant data makes the database more compact. Through Normalization, more tables are created, which helps in the easy and efficient maintenance of data. Through Normalization, obviously, the better performance of searching and sorting is ensured through indexes and keys, and 1NF plays a vital role in it.
Recommended Articles
This is a guide to First Normal Form. Here we discuss the Definition, Working, and Advantages of the First Normal Form along with its Examples. You may also look at the following articles to learn more –
SQL Training Program (7 Courses, 8+ Projects)
Normalization in Relational Databases: First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF)
Agnieszka is a Chief Content Officer at Vertabelo. Before coming to Vertabelo, she worked as a Java programmer. She has a PhD in mathematics and over 10 years of experience in teaching mathematics and computer science at the University of Warsaw. In her free time, she enjoys reading a good book, going mountain hiking and practicing yoga.
What is database normalization? What are the different normal forms, and what do they do? Find out in this article.
Normalization in relational databases is a design process that minimizes data redundancy and avoids update anomalies. Basically, you want each piece of information to be stored exactly once; if the information changes, you only have to update it in one place.
The theory of normal forms gives rigorous meaning to these informal concepts. There are many normal forms. In this article, we’ll review the most basic:
There are normal forms higher than 3NF, but in practice you usually normalize your database to the third normal form or to the Boyce-Codd normal form, which we won’t cover here.
So, what is this theory of normal forms? It deals with the mathematical construct of relations (which are a little bit different from relational database tables). The normalization process consists of modifying the design through different stages, going from an unnormalized set of relations (tables), to the first normal form, then to the second normal form, and then to the third normal form.
Don’t worry if this sounds complicated; I promise it will get clearer as we go through each step. Let’s start with 1NF – the first step.
First Normal Form (1NF)
A relation is in first normal form (1NF) if (and only if):
In practice, 1NF means that you should not have lists or other composite structures as attribute values. Below is an example of a relation that does not satisfy 1NF criteria:
student | courses |
---|---|
Jane Smith | Databases, Mathematics |
John Lipinsky | English Literature, Databases |
Dave Beyer | English Literature, Mathematics |
This relation is not in 1NF because the courses attribute has multiple values. Jane Smith is assigned to two courses (Databases and Mathematics), and they are stored in one field as a comma-separated list. This list can be broken down into smaller elements (i.e. course subjects: databases as one element, mathematics as another), so it’s not an atomic value.
To transform this relation to the first normal form, we should store each course subject as a single value, so that each student-course assignment is a separate tuple:
student | course |
---|---|
Jane Smith | Databases |
Jane Smith | Mathematics |
John Lipinsky | English Literature |
John Lipinsky | Databases |
Dave Beyer | English Literature |
Dave Beyer | Mathematics |
If you’re interested in reading more about the first normal form, I recommend the article What Is the Actual Definition of First Normal Form? by my colleague Konrad Zdanowski.
Second Normal Form (2NF)
A relation is in second normal form (2NF) if and only if:
What does this mean? If the value of attribute A is determined by the value of attribute S, then A is functionally dependent on S. For example, your age is functionally dependent on your date of birth. For more on functional dependencies, see this article.
Let’s go back to the idea of candidate keys and non-prime attributes. What are they?
Informally, the second normal form states that all attributes must depend on the entire candidate key.
Let’s see an example of a relation that does not satisfy 2NF. The underlined attributes are the candidate key.
Bike parts warehouse
Why doesn’t this satisfy 2NF? The set is the only candidate key of this relation. The value of supplier country is functionally dependent on supplier. Supplier country is not part of the candidate key, so it is a non-prime attribute and it is functionally dependent on part of the candidate key, not the entire candidate key .
To transform this relation into 2NF, we need to split it into two relations: Bike parts (with the attributes part, supplier, and quantity) and Suppliers (with the attributes supplier and supplier country). This would look like as follows:
part | supplier | quantity |
---|---|---|
Saddle | Bikeraft | 10 |
Brake lever | Tripebike | 5 |
Top tube | UpBike | 3 |
Saddle | Tripebike | 8 |
The relation Bike parts is in 2NF because, as before, the quantity attribute depends on the pair supplier and part.
supplier | supplier country |
---|---|
Bikeraft | USA |
Tripebike | Italy |
UpBike | Canada |
The Suppliers relation is in 2NF because supplier country is functionally dependent on supplier, which is the candidate key of this relation.
Let’s see one more example of a non-2NF relation.
Student course fees
The following relation does not satisfy 2NF. The set is the relation’s candidate key, but the value of course fee is functionally dependent on course alone. Course fee is a non-prime attribute, which is functionally dependent on only part of the candidate key.
To transform this into 2NF, we again split it into two relations: Student courses (with the attributes student, course, and grade) and Courses (with the attributes course and course fee). Thus, we avoid the partial dependency in the non-2NF relation above.
student | course | grade |
---|---|---|
Alison Brown | Databases | A |
Jason Liu | Mathematics | B |
Mariah Hill | Databases | B+ |
course | course fee |
---|---|
Databases | $100 |
Mathematics | $150 |
Why not try verifying for yourself that these relations are indeed 2NF?
Note that the 2NF partial dependency rule only kicks in if your relation has a composite candidate key (i.e. one that consists of multiple attributes). All relations that have a single-attribute key are by definition in 2NF.
Third Normal Form (3NF)
A relation is in third normal form (3NF) if and only if:
In other words, non-prime attributes must be functionally dependent on the key(s), but they must not depend on another non-prime attribute. 3NF non-prime attributes depend on “nothing but the key”.
Let’s see a non-3NF relation:
This relation does not satisfy 3NF. The only candidate key in this relation is order_id. The value of customer email is functionally dependent on the customer attribute, which is a non-prime attribute. Thus, the relation violates 3NF.
Once again, we split this into two relations: Orders (with the attributes order_id, date, and customer) and Customers (with the attributes customer and customer email):
order_id | date | customer |
---|---|---|
1/2020 | 2020-01-15 | Jason White |
2/2020 | 2020-01-16 | Mary Smith |
3/3030 | 2020-01-17 | Jacob Albertson |
4/2020 | 2020-01-18 | Bob Dickinson |
customer | customer email |
---|---|
Jason White | white@example.com |
Mary Smith | msmith@mailinator.com |
Jacob Albertson | jasobal@example.com |
Bob Dickinson | bob@fakemail.com |
Orders is in 3NF because the date and customer attributes do not violate the rule of 3NF; their values depend on the order_id number. Customers is in 3NF because customer email is functionally dependent on customer, which is the candidate key of this relation. In both cases, all non-prime attributes depend on the candidate key.
Let’s see one more non-3NF example.
This relation does not satisfy 3NF. The only candidate key in this relation is , but the value of teacher date of birth is functionally dependent on teacher – a non-prime attribute. This violates 3NF.
Guess how we’ll transform this into 3NF? That’s right; we split the relation. Courses gets the attributes course, year, and teacher; Teachers gets the attributes teacher and teacher date of birth:
course | year | teacher |
---|---|---|
Databases | 2019 | Chris Cape |
Mathematics | 2019 | Daniel Parr |
Databases | 2020 | Jennifer Clock |
teacher | teacher date of birth |
---|---|
Chris Cape | 1974-10-12 |
Daniel Parr | 1985-05-17 |
Jennifer Clock | 1990-06-09 |
Try verifying that these relations are indeed in 3NF for yourself. How would you explain the changes made?
Database Normalization: Summary
First, second, and third normal forms are the basic normal forms in database normalization:
Normalization of Database
Database Normalization is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts data into tabular form, removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,
The video below will give you a good overview of Database Normalization. If you want you can skip the video, as the concept is covered in detail, below the video.
Problems Without Normalization
If a table is not properly normalized and have data redundancy then it will not only eat up extra memory space but will also make it difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anomalies are very frequent if database is not normalized. To understand these anomalies let us take an example of a Student table.
rollno | name | branch | hod | office_tel |
---|---|---|---|---|
401 | Akon | CSE | Mr. X | 53337 |
402 | Bkon | CSE | Mr. X | 53337 |
403 | Ckon | CSE | Mr. X | 53337 |
404 | Dkon | CSE | Mr. X | 53337 |
Insertion Anomaly
Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be inserted, or else we will have to set the branch information as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch information will be repeated for all those 100 students.
These scenarios are nothing but Insertion anomalies.
Updation Anomaly
What if Mr. X leaves the college? or is no longer the HOD of computer science department? In that case all the student records will have to be updated, and if by mistake we miss any record, it will lead to data inconsistency. This is Updation anomaly.
Deletion Anomaly
In our Student table, two different informations are kept together, Student information and Branch information. Hence, at the end of the academic year, if student records are deleted, we will also lose the branch information. This is Deletion anomaly.
Normalization Rule
Normalization rules are divided into the following normal forms:
First Normal Form (1NF)
For a table to be in the First Normal Form, it should follow the following 4 rules:
In the next tutorial, we will discuss about the First Normal Form in details.
Second Normal Form (2NF)
For a table to be in the Second Normal Form,
To understand what is Partial Dependency and how to normalize a table to 2nd normal for, jump to the Second Normal Form tutorial.
Third Normal Form (3NF)
A table is said to be in the Third Normal Form when,
Here is the Third Normal Form tutorial. But we suggest you to first study about the second normal form and then head over to the third normal form.
Boyce and Codd Normal Form (BCNF)
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, following conditions must be satisfied:
To learn about BCNF in detail with a very easy to understand example, head to Boye-Codd Normal Form tutorial.
Fourth Normal Form (4NF)
A table is said to be in the Fourth Normal Form when,
Here is the Fourth Normal Form tutorial. But we suggest you to understand other normal forms before you head over to the fourth normal form.