MySQL JSON datatype with generated virtual columns and indexing
Updated: May 15, 2019
MySQL provisions the natural JSON data type later version 5.7.8. The natural JSON data type lets you to store JSON documents more proficiently than the JSON text format in the preceding versions.
MySQL supplies JSON documents in an inside format that allows rapid read contact to document elements. The JSON binary arrangement is structured in the way that documents the server to search for values inside the JSON document in a straight line by key or array index, which is very efficient.
The storing of a JSON document is about the same as the storage of LONGBLOB or LONGTEXT data.
To describe a column whose data type is JSON, you use the subsequent syntax:
Note that a JSON column cannot have a default value. In adding, a JSON column cannot be indexed straight. As an alternative, you can generate an index on a created column that comprises values take out from the JSON column. When you enquiry data from the JSON column, the MySQL optimizer will look for well-matched indexes on virtual columns that match JSON terms.
MySQL JSON data type example
The dataset covers a list of players with the subsequent elements: a player ID, name, and games played
Battlefield covers the player's preferred weapon, their present rank, and the level of that rank, while Crazy Tennis comprises the number of games won and lost, and Puzzler comprises the time it took a player to resolve the game. Let's start with our original table formation
Creating a column
When you want to produce created columns, you use this syntax within a CREATE TABLE declaration to set them up:
The key at this time are the words GENERATED ALWAYS and AS. The phrase GENERATED ALWAYS is optional; it's only wanted if you want to clearly state that the table column is going to be a created column. What is essential is the word AS trailed by an expression that will return a value for what you want in the produced column. Let's start there:
We're create a column called names_virtual which is up to 20 characters long and will comprise the value of the "name" stuff from the JSON dataset. We'll access the "name" using a JSON path using MySQL's ->> operator, which is equal to writing JSON_UNQUOTE(JSON_EXTRACT(...)) that will return the "name" as an unquoted outcome from the JSON document. We've chatted about some of these JSON purposes here.
That means we're working to take the JSON field player_and_games and extract the key name which is a child of the root.
As with most column descriptions, there's a number of limitations and options you can apply to a column.
Exclusive to created columns, the keywords VIRTUAL and STORED designate whether the values will not or will be stored in the table. The keyword VIRTUAL is used by default, which means that the column's values are not kept so they don't take up storing space. They are estimated every time the row is read. If you generate an index with a virtual column, the value sees to get stored - in the index itself. The STORED keyword, on the other hand, shows that values are considered as the data is written to a table, which means values are considered when documents are inserted or updated. In this case, the index doesn't need to stock the value and performs more conventionally.
The last three constraints impose whether the values can be NOT NULL or NULL and add index limits such as UNIQUE or PRIMARY KEY. We possibly should always use NOT NULL when generating a column to certify that values exist but using index limitations depend on your use case.
The other choices are optional limitations to impose whether the values can be NOT NULL or NULL and add index constraints such as UNIQUE or PRIMARY KEY. If you are depending on on a field present, you should use NOT NULL when generating a column to confirm that values exist. The limitations really rest on your use case. We'll use NOT NULL as we think players to have a name, just not a unique one.
Now let's look at our CREATE TABLE statement:
If we use this to generate the table, we can then insert some of the JSON documents. You can find the SQL for this in the constitute examples repository. In this dataset, we've injected the id for each player and then the JSON document like:
Once we've run the code, and the data has been injected into the players table, we can do a SELECT request giving us the subsequent:
As we can see, the table comprises the names_virtual column with all the player's names injected. Let's do a quick check to show how the columns have been set up by MySQL:
Since we haven't designated whether the created column is VIRTUAL or STORED, by default MySQL automatically set up a VIRTUAL column for us. If you don't see whether your columns are VIRTUAL or STORED, just run the directly above SHOW COLUMNS query and it will whichever show VIRTUAL GENERATED or STORED GENERATED.
Now that we set up the table and our initial VIRTUAL column, let's add four more columns by the ALTER TABLE and ADD COLUMN processes. These will grip the Battlefield levels, tennis games won, tennis games lost, and the Puzzler times.
Again, running the query SHOW COLUMNS FROM players; we see that all the columns have VIRTUAL GENERATED next to them, meaning that we've successfully set up new VIRTUAL generated columns.
Running the SELECT query demonstrations us all the values from the VIRTUAL COLUMNS, which should look like:
Now that the data has been introduced and the created columns set up, we can create indexes on each column to improve our explorations ...
Indexing created columns
When pushing secondary indexes on VIRTUAL created column values, the values are occurred and stored in the index. This gives us the benefit of not swelling the table size and being able to take benefit of MySQL indexing.
Let's do a simple query on a created column to see what it looks like already we index it. Probing the query plan when choosing names_virtual and the name "Sally", we'd get the subsequent:
For this query, MySQL must look at each row to find "Sally". However, we get an completely unlike result once we place an index on the column like:
After indexing running the similar query, we get:
As we can see, our index on the column sped up our request by only looking at one row instead of six using the index names_idx as an alternative of all of the rows. Let's create indexes on the rest of our virtual columns next the same syntax as names_idx like:
After indexing We can crisscross to see if all our columns have now been indexed by running:
Now that we formed multiple indexes on our created columns, let's make a more composite search to see how created columns and indexes work. For this instance, we'll get the ids, names, tennis games won, Battlefield level, and Puzzler time for players who have a equal above 50 and who also have won 50 tennis games. All the outcomes will be ordered by ascending order giving to the Puzzler time. The SQL command and outcomes will look similar:
As you can see how MySQL planned out the query above.
When by the indexes on won_idx and level_idx it only had to access two columns to return the outcome we required. As you can get, if the query had to do a comprehensive table scan on millions of documents, it would possibly have taken a very long time. However, with the control of created columns and indexing those columns, MySQL has providing a very fast and suitable way to search elements within JSON documents.
This blog covered the methods to understand in more complexity how to use JSON documents in your MySQL database, and how to use created columns and indexes to search for your data swiftly and proficiently.