Full text indexing is for columns (char, varchar or text) where we have long strings or document and we need to search with in that text for some words and filter data based on those words. Omitting any words that are called stopwords.
We can define FULLTEXT while creating new table or adding into the existing table using the ALTER TABLE or we can use CREATE INDEX statement.
We can search on the FULLTEXT indexed columns using the MATCH(column_name) AGAINST(text_to_search).
Design Indexing
MySQL FULLTEXT indexes have an inverted index strategy. Inverted indexes accumulate a list of words, and to each word, a list of documents that the word looks in. To care proximity search, position info for each word is also kept, as a byte offset.
Table Full Text Indexing
Here is how you create the FULLTEXT index table.
CREATE TABLE `ft_tutorial` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(200) DEFAULT NULL,
`description` text,
PRIMARY KEY (`id`),
FULLTEXT KEY `title` (`title`,`description`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The first six tables denote the inverted index and are mentioned to as supporting index tables. When inbound documents are tokenized, the separate words are inserted into the index tables along with location info and the related Document ID . The words are entirely sorted and separated among the six index tables based on the character set sort weight of the words first character.
The inverted index is divided into six additional index tables to backing parallel index creation. By default, two processes tokenize, sort, and insert words and linked data into the index tables. The number of threads is configurable using the innodb_ft_sort_pll_degree optimal. Study increasing the number of threads when creation FULLTEXT indexes on big tables.
Secondary index table terms are prefixed with FTS_ and postfixes with INDEX_*. Each index table is linked with the indexed table by a hex value in the index table name that tells the table_id of the indexed table. For sample, the table_id of the ft_search/ft_tutorial table is 1058, for which the hex value is 0x422. As shown in the previous example, the “422” hex value looks in the names of index tables that are linked with the test/opening_lines table.
A hex value instead of the index_id of the FULLTEXT index also looks in auxiliary index table names. For instance, in the auxiliary table name ft_search/FTS_0000000000000422_0000000000000948_INDEX_1, the hex value 948 has a decimal value of 457. The index clear on the title, description table (title) can be identified by querying the INFORMATION_SCHEMA.INNODB_SYS_INDEXES table for this value (422).
The other index tables exposed in the former sample are discussed to as shared index tables and are used for deletion organization and loading the inner state of FULLTEXT indexes.
Cache Full Text Indexing
When a text line is presented, it is tokenized, and the distinct words and related data are inserted into the FULLTEXT index. This procedure, even for small text strings, could result in many small add-ons into the extra index tables, making concurrent access to these tables a point of clash. To evade this difficult, InnoDB uses a FULLTEXT index cache to for the period cache index table insertions for freshly injected rows. This in-memory cache structure holds additions till the cache is complete and then flushes them to disk
FTS_DOC_ID and Full Text Indexing
InnoDB uses a single document identifier mentioned to as a Document ID (DOC_ID) to map words in the full-text index to document records where the word looks. The mapping needs an FTS_DOC_ID column on the indexed table. If an FTS_DOC_ID column is not defined, InnoDB automatically adds an unseen FTS_DOC_ID column when the full-text index is formed.
CREATE TABLE `ft_tutorial` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(200) DEFAULT NULL,
`description` text,
PRIMARY KEY (`id`),
FULLTEXT KEY `title` (`title`,`description`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Deletion Handling
Erasing a record that has a full-text index column could effect in many small removals in the auxiliary index tables, making simultaneous access to these tables a point of dispute. To avoid this badly-behaved, the Article ID (DOC_ID) of a deleted article is logged in a different FTS_*_DELETED table whenever a record is erased from an indexed table, and the indexed record leftovers in the full-text index. Before returning query outcomes, material in the FTS_*_DELETED table is used to filter out erased Article IDs. The advantage of this design is that removals are fast and reasonable. The disadvantage is that the size of the index is not directly compact after deleting archives. To eliminate full-text index entries for deleted archives, run OPTIMIZE TABLE on the indexed table with innodb_optimize_fulltext_only=ON to reconstruct the full-text index.
Transaction Handling
InnoDB FULLTEXT indexes have different operation handling features due its reserving and batch handing out behavior. Updates and additions on a FULLTEXT index are treated at operation commit time, which means that a FULLTEXT hunt can only see dedicated data.
Monitoring Full Text Indexes
We can display the full text dealing out of innodb table indexes by inquiring the following tables in INFORMATION_SCHEMA INNODB_FT_CONFIG
INNODB_FT_INDEX_TABLE
INNODB_FT_INDEX_CACHE
INNODB_FT_DEFAULT_STOPWORD
INNODB_FT_DELETED
INNODB_FT_BEING_DELETED
INNODB_SYS_INDEXES
INNODB_SYS_TABLES
Comments