What Is Indexation?

An index is a structure that sorts the values of one or more columns in a database table. Use an index to quickly access specific information in a database table. If you want to find a specific employee by his or her last name, an index can help you get information faster than searching all the rows in a table.

  • Search code. It represents a set of records of various character segments. It can be any sequence combination of one or more character segments, and is not the only identification record.
  • Data directory entry. It is the relevant element of the index. In the process of indexing, data directory items generally have various selection methods.
  • Record ID. Each / segment index is a unique identifier in the stored content.
database
set up
Indexing can greatly improve the speed of obtaining the required information in the database, and also improve the efficiency of the server in processing related search requests. From this aspect, it has the following advantages [1]
Although the establishment of indexes has many positive effects in improving retrieval efficiency, it still has the following shortcomings [1]
The most common case is to build an index on the fields that appear in the where clause.
  CREATE TABLE mytable (
 idserial int primary key,
 category_id int default 0not null,
 user_id int default 0not null,
 adddate int default 0not null
 );

If you use queries similar to the following:
SELECT * FROM mytable WHERE category_id = 1;
The most direct response is to build a simple index for category_id:
CREATE INDEX mytable_categoryid ON mytable (category_id);
OK. What if there is more than one selection condition? E.g:
SELECT * FROM mytable WHERE category_id = 1 AND user_id = 2;
The first reaction may be to create another index for user_id. No, this is not the best method. Multiple indexes can be established.
CREATE INDEX mytable_categoryid_userid ON mytable (category_id, user_id);
Notice the habit of naming? Use "table name_field1 name_field2 name" method. You'll soon know why you did this.
Indexes have been established for the appropriate fields, but I'm still a bit uneasy. I may ask, will the database really use these indexes? Test it and it's OK. For most databases, this is very easy, just use the EXPLAIN command:
  EXPLAIN
 SELECT * FROM mytable
 WHERE category_id = 1 AND user_id = 2;
 This is what Postgres 7.1 returns (exactlyasI expected)
 NOTICE: QUERY PLAN:
 Index Scan using mytable_categoryid_userid on
 mytable (cost = 0.00..2.02 rows = 1 width = 16)
 EXPLAIN

The above is the data of postgres. You can see that the database uses an index (a good start) when querying, and it uses the second index created. Seeing the benefits of naming above, you know right away that it uses the appropriate index.
Next, a little more complicated, what if there is an ORDERBY clause? Believe it or not, most databases will benefit from indexing when using orderby.
SELECT * FROM mytable
WHERE category_id = 1 AND user_id = 2
ORDER BY adddate DESC;
It's simple, just like building an index on a field in a where clause, and also an index on a field in an ORDER BY clause:
CREATE INDEX mytable_categoryid_userid_adddate ON mytable (category_id, user_id, adddate);
Note: "mytable_categoryid_userid_adddate" will be truncated to "mytable_categoryid_userid_addda"
  CREATE
 EXPLAIN SELECT * FROM mytable
 WHERE category_id = 1 AND user_id = 2
 ORDER BY adddate DESC;
 NOTICE: QUERY PLAN:
 Sort (cost = 2.03..2.03 rows = 1 width = 16)
 -> Index Scanusing mytable_categoryid_userid_addda
 on mytable (cost = 0.00..2.02 rows = 1 width = 16)
 EXPLAIN

Looking at the output of EXPLAIN, the database has done an unsorted sort. Now you know how the performance is damaged. It seems a bit too optimistic about the database's own operation. Then, give the database a little more hint.
In order to skip the sorting step, no other index is needed, just change the query slightly. The postgres used here will give the database an additional hint-in the ORDER BY statement, add the fields in the where statement. This is just a technical process and is not necessary, because in fact there will not be any sorting operation on the other two fields, but if you add it, postgres will know what it should do.
  EXPLAIN SELECT * FROM mytable
 WHERE category_id = 1 AND user_id = 2
 ORDER BY category_id DESC, user_id DESC, adddate DESC;
 NOTICE: QUERY PLAN:
 Index Scan Backward using
 mytable_categoryid_userid_addda on mytable (cost = 0.00..2.02 rows = 1 width = 16)
 EXPLAIN

Now I use the expected index, and it is quite smart, knowing that you can start reading from behind the index, thereby avoiding any sorting.
The above is a bit more detailed, but if the database is very large, and the daily page requests reach millions, it will benefit a lot. However, what if you want to do more complex queries, such as combining multiple tables, especially when the fields in the where clause are from more than one table, what should you do? This is usually avoided as much as possible, because the database will combine the contents of the various tables, and then exclude those inappropriate rows, or it will cost a lot of money.
If you can't avoid it, you should look at each table to be combined, and use the above strategy to build the index, and then use the EXPLAIN command to verify whether the expected index is used. If so, OK. If not, you might want to create temporary tables to join them together and use appropriate indexes.
It should be noted that creating too many indexes will affect the speed of updates and inserts, because it needs to update each index file as well. For a table that needs to be updated and inserted frequently, it is not necessary to separately index an infrequently used where clause. For smaller tables, the sorting overhead will not be very large, and it is not necessary to establish another index.
The above is just some very basic things. In fact, there is a lot of knowledge in it. EXPLAIN alone cannot determine whether the method is optimized. Each database has its own optimizer, although it may not be perfect. However, they will compare which method is faster when querying. In some cases, indexing may not be fast. For example, when the index is placed in a discontinuous storage space, this will increase the load of reading the disk. Therefore, Which is the best should be tested by the actual use environment.
At the beginning, if the table is not large, there is no need for indexing. The opinion is to index when needed. You can also use some commands to optimize the table. For example, MySQL can use "OPTIMIZETABLE".

IN OTHER LANGUAGES

Was this article helpful? Thanks for the feedback Thanks for the feedback

How can we help? How can we help?