What Is Data Integrity?

A state where all data values stored in the database are correct. If an incorrect data value is stored in a database, the database is said to have lost data integrity.

The data in the database is input from the outside, and due to various reasons, the input of the data may result in invalid input or error information. Ensure that the data entered meets the requirements and becomes
Data integrity is divided into the following three categories.
(1) Domain integrity: refers to the validity of the input of a column, whether it is allowed to be null. The methods to enforce domain integrity are: restriction type (by setting the data type of the column), format (by CHECK constraints and rules), or the range of possible values (by FOREIGN KEY constraints, CHECK constraints, DEFAULT definitions, NOT NULL definitions and rules ). For example, students' test scores must be between 0 and 100, and their gender can only be "male" or "female". (2) Entity integrity: refers to ensuring that all rows in the table are unique. Entity integrity requires that all rows in the table have a unique identifier. This unique identifier may be a column or a combination of columns, called the primary key. That is, the primary key in the table must take a unique value across all rows. The methods to enforce entity integrity are: index, UNIQUE constraint, PRIMARY KEY constraint or IDENTITY attribute. For example, the value of sno (student number) in the student table must be unique. It uniquely identifies the student represented by the corresponding record. Duplicate student numbers are illegal. The student's name cannot be used as the primary key, because it is entirely possible for two students to have the same name and last name.
(3) Referential integrity: It refers to guaranteeing the referential relationship between the primary keyword (referenced table) and external keywords (reference table). It involves maintaining the consistency of two or more tables of data. The foreign key value associates the record in the referenced table that contains the foreign key with the record in the referenced table whose primary key matches the foreign key. When entering, changing, or deleting records, referential integrity maintains the defined relationships between the tables, ensuring that key values are consistent across all tables. This consistency requirement ensures that non-existent values are not referenced. If a key value is changed, all references to that key value must be changed consistently throughout the database. Referential integrity is based on the relationship between the foreign key and the primary key. For example, the course number of a student studying a course must be a valid course number. The foreign key cno (course number) in the score table (score table) will refer to the primary key cno (course number) in the course table (course table) to achieve data integrity.
Domain integrity, entity integrity, and referential integrity are implemented on columns, rows, and tables, respectively. Data integrity can be implemented at any time, but when implementing data integrity on a table with existing data, the system must first check whether the data in the table meets the implemented integrity, and only the data in the table meets the implemented integrity Only then can data integrity be implemented successfully. [1]
There are four types of integrity constraints: entity integrity constraints, referential integrity constraints, functional dependency constraints, and statistical constraints.
Entity Integrity Entity integrity means that all primary attributes (ie, attributes of the primary code) in a relationship cannot take null values. The so-called "null value" is the value of "don't know" or "meaningless". If the main attribute takes a null value, it indicates an unidentifiable entity, which is inconsistent with the real-world application environment, so this entity must not be a complete entity.
Referential integrity constraint Referential integrity constraint refers to the value of the foreign code in the reference relationship or the null value (each attribute of the foreign code is a null value) or the primary code value of a tuple in the referenced relationship. An example is given below. There are two modes of relationship: student (student number, name, gender, major number, age) and major (professional number, major name). There are several issues to be aware of when implementing referential integrity: Can foreign codes accept null values? Because whether the outer code is empty depends on the application environment. If there are two relationship modes: elective (student number, good course, grade), students (student number, name, gender, age, department), in the elective relationship The outer code "student number" cannot be empty. If it is empty, it means that a student who does not know the student number has taken a certain course, which is not consistent with the school's application environment. Consideration when deleting the original group of the referenced relationship. Sometimes it is necessary to delete a certain original group of the referenced relationship, and there are several original groups in the referenced relationship whose outer code values correspond to the primary coded values in the deleted referenced relationship. All foreign code values in the relationship are deleted with the tuple corresponding to the primary code value of the tuple to be deleted in the referenced relationship. If the reference relationship is also the referenced relationship of another relationship, this cascading deletion should be cascaded. It should be cascaded. Considerations when modifying the master code in the referenced relationship. When encountering this situation, the method is similar to deleting the tuple of the referenced relationship, and cascading modification is required. At the same time, the primary code value in the referenced relationship is modified, and the corresponding outer code value in the referenced relationship is modified in the same way.
Function Dependency Constraints Most function dependency constraints are implicit in the structure of relational patterns, especially relational patterns with a higher degree of standardization (such as 3NF or BCNS). The pattern maintains functional dependencies. In practical applications, in order not to separate information too much, it is generally not possible to excessively pursue standardization. In this way, there can be some functions between the fields of the relationship to be explicitly expressed.
A system constraint is a constraint relationship between the value of a field and the statistical value of multiple tuples. For example, the salary of the manager of this department must not be higher than 5 times the average salary of the employees of this department. The average salary of employees is a statistical calculation. In many cases, statistical data can often be made public, while individual data is kept secret, but individual data values can be inferred from the statistical data, so certain precautions must be taken to prevent data leakage.
The methods for implementing integrity constraints vary by category. Integrity constraints can be divided into two categories: static constraints and dynamic constraints.
Static constraints Static constraints are constraints on the state of the database. They can be divided into fixed constraints, implicit constraints, and explicit constraints. [2]
Fixed constraints are inherent constraints of the data model. If the attributes of the relationship are atomic, the constraints of the INF are satisfied. Inherent constraints have been considered in the implementation of the DBMS.
Implicit constraints are constraints that are implicit in the data schema. They are generally described in data definition language (DDL) statements and stored in the data directory. For example, domain integrity constraints, entity integrity, and referential integrity constraints are all specified by corresponding DDL statements.
Show constraints Inherent constraints, implicit constraints are the most basic constraints, but not all constraints can be summarized. Data integrity is diverse and depends on the semantics and application of the data. These constraints are only explicitly stated, so they are called explicit constraints. There are generally three ways to explain explicit constraints: Use procedures to describe constraints. This method leaves the specification and verification of the constraint to the application, and procedures can be inserted into the application to verify that the database update violates a given constraint, and if the constraint is violated, the transaction is rolled back. The process of checking constraints is generally written in a general high-level programming language, which can express a variety of constraints. This is a common approach. Use language to explain constraints. Assertions are logical conditions that the state of the database must meet. Database integrity constraints can be viewed as a collection of assertions. To express constraints, the DBMS must provide an assertion specification language. Constraints are expressed by triggers. A trigger is a software mechanism whose function is equivalent to WHENEVERTHEN, that is, once a given condition is established, the system triggers a corresponding action. Triggers can be used to express constraints, to violate constraints as a condition, and to deal with violation violations as actions. Actions are not limited to rolling back transactions, but can also give users a message or process. After defining a batch of triggers in the system, the state of the database will be monitored, and once an update that violates the constraint occurs, a corresponding action will be triggered.
Dynamic Constraints Dynamic constraints are not constraints on the state of the database, but are constraints that the database should abide by when it changes from one state to another. For example, when updating the employee table, attributes such as wages and seniority generally only increase and do not decrease. Dynamic constraints are generally also explicitly stated. Among the above constraints, inherent constraints must be implemented, implicit constraints are basically implemented or partially implemented in most modern DBMSs, and explicit and dynamic constraints are implemented only in individual DBMSs.

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?