This manual describes GridDB SQL tuning.
This document is intended for developers who develop systems using SQL with GridDB.
The contents of this manual are as follows.
For optimization rules and tuning, the distinctive features of GridDB are mainly descried.
SQL tuning is performed in the following procedure, similar to the case of a general database system.
STEP 1 Check a slow query
STEP 2 Obtain the plan
STEP 3 Tune the query
STEP 4 Re-execute the query
Details about STEP 1 to STEP 3 are explained below.
Among the queries running on the system, identify slow queries that are taking a long time.
A slow query that takes a long execution time can be set to output information about the query and the execution time to the event log. Thereby, the bottleneck can be identified among the queries executed from an application.
The procedure to check the slow query is as follows.
GridDB node has parameters for the threshold of slow query execution time and the upper limit of the query character string size output to the event log.
Parameter | Description | Default value |
---|---|---|
/sql/traceLimitExecutionTime | Lower limit of execution time (unit: second) for the event logging as a slow query. | 300s |
/sql/traceLimitQuerySize | Size upper limit of the query character string for logging as a slow query. (unit: byte) | 1000 |
To change from the default value, use either of the following two methods.
[Memo]
Check the log of a slow query. There are following two ways for checking. Identify the slow query using any one of these ways.
The ways for checking | Item |
---|---|
Check the latest information online | Execute the option –slowlogs of operation tool gs_logs to display the information of a slow log. - Only the information of the latest event log file is displayed. The event log file is rotated when the file size exceeds the threshold or when the date changes. Once rotated, gs_logs will not show the contents of the old event log file. |
Check an event log file directly. | Check the event log file of a node directly. Get the slow query log from the log file using “SQL_LONG_QUERY” as a keyword. |
[Memo]
Execute the slow query identified in STEP 1 and get the query plan (global plan).
Use the operation tool gs_sh to get a plan. Execute the query with EXPLAIN ANALYZE syntax and get the plan with the sub-command getplantxt.
(1) Execute the query in the “EXPLAIN ANALYZE” syntax.
gs[public]> sql EXPLAIN ANALYZE select * from table1, table2 where table1.value=1 and table1.id=table2.id ;
A search was executed. (19 ms)
(2) Get the plan.
gs[public]> getplantxt
Id Type Input Rows Lead time Actual time Node And more..
--------------------------------------------------------------------------------------------------------------------
0 SCAN - - 30 30 192.168.15.161:10001 table: {table1}
1 SCAN 0 3000 21 21 192.168.15.161:10001 table: {table1, table2} INDEX SCAN JOIN_EQ_HASH
2 RESULT 1 14 0 0 192.168.15.161:20001
GridDB processes SQL as follows: A node parses queries and generate a plan. The plan consists of plans for each task, a unit of execution, which are executed by each node to which the task was assigned.
The sub-command getplantxt for showing a plan displays a plan for each task line by line. An output of a task becomes the input of the next task.
Using the execution example of “(2) Get the plan” the way to display the plan is explained specifically.
In the scan processing under a search condition in a WHERE clause and in the scan processing for joining a table, the performance may greatly change depending on whether the index is used or not. As for joining a table, the performance depends greatly on the join order. Note these points when tuning a query.
(1) Analyze a plan
Analyze a plan to check whether the index is used and the queries are executed as intended. See the SQL plan for the details of a plan.
How to check whether the index is used or not
(2) Tuning
Tune using the following methods depending on the problem identified from the analysis of a plan.
Example) A query for joining tables is explained as a tuning example.
Get a plan for a query which joins table1 and table2 with the value of the column value.
gs[public]> EXPLAIN ANALYZE select * from table1, table2 where table1.value=0 and table1.value=table2.value;
A search was executed. (13 ms)
gs[public]> getplantxt
Id Type Input Rows Lead time Actual time Node And more..
-------------------------------------------------------------------------------------
0 SCAN - - 20 20 192.168.15.161:10001 table: {table1}
1 SCAN - - 9 9 192.168.15.161:10001 table: {table2}
2 JOIN 0,1 10000,3000 891 891 192.168.15.161:20001 JOIN_EQ_HASH
3 RESULT 2 32 2 2 192.168.15.161:20001
In this plan, table1 and table2 are scanned respectively, and join is processed without using an index.
Checking the index information of the table showed that the column value was not indexed, so an index is created.
Execute the same query to get a plan.
gs[public]> EXPLAIN ANALYZE select * from table1, table2 where table1.value=0 and table1.value=table2.value;
A search was executed. (7 ms)
gs[public]> getplantxt
Id Type Input Rows Lead time Actual time Node And more..
--------------------------------------------------------------------------------------------------------------------
0 SCAN - - 20 20 192.168.15.161:10001 table: {table1}
1 SCAN 0 10000 80 80 192.168.15.161:10001 table: {table1, table2} INDEX SCAN JOIN_EQ_HASH
2 RESULT 1 32 3 3 192.168.15.161:20001
“INDEX SCAN” is displayed for plan ID2, and the plan has changed to use an index in join processing.
In a scan processing to find data that matches the search condition of WHERE clause, the “index scan” using the table index is often faster than the “full scan” which accesses all the rows of the table.
The index scan is more effective when the search condition of a WHERE clause can narrow down the data by reducing the hit ratio to the number of rows of a table.
This section explains rules to select an index for scan processing when optimizing GridDB SQL.
Rules are different depending on the operator or expression specified in the search condition.
AND
Basically, when indexes are set on the columns used as search conditions, the first index is used.
Example:
a>1 AND b=2 (both a and b are indexed)
Provided that, in exceptional cases, the first index is not used. An example of such cases is shown below.
If a constant false is added as an OR condition, “OR false”, the index is not used.
Specify this syntax not to use an index in a scan.
Example) In order not to use the index of a in “a>1 AND b=2”
(a>1 OR false) AND b=2 (both a and b are indexed)
Example) In order not to use the indexes of both a and b in “a>1 AND b=2”
(a>1 AND b=2) OR false (both a and b are indexed)
a>1 AND b=2 OR false (both a and b are indexed)
If an AND condition contains OR or IN, the index is used only in the first expression (the leftmost expression) for which an index is available.
Example:
(a=1 OR a=3) AND b=2 (both a and b are indexed)
a IN (1,3) AND b=2 (both a and b are indexed)
Example:
(a=1 OR a=3 OR false) AND b=2 (both a and b are indexed)
(a IN (1,3) OR false) AND b=2 (both a and b are indexed)
If conditions for the same column overlap, merge the conditions before using the index
Example:
a>1 AND a<=4 AND a<=3 ("a" is indexed)
OR
In the case of OR, the index is used only when all the columns specified as search conditions are indexed.
Example:
a>1 OR b=2 (both a and b are indexed)
Example:
a>1 OR b=2 (b is indexed)
Expressions for a comparison operator
The index is used only when the expressions for a comparison operator consist of a single column value and a constant expression.
Example:
a> 10*1000-1 ("a" has an index)
Example:
a+1>10*1000-1 ("a" has an index)
Example:
a>b (both a and b are indexed)
IN, BETWEEN
IN and BETWEEN, expressions combining AND, OR, and a comparison operators, the rules mentioned above are applied to.
Example:
a IN (1,2) → a=1 OR a=2 ("a" is indexed)
a BETWEEN 1 AND 2 → a>=1 AND a<=2 ("a" is indexed)
[Memo]
Indexes used for index scan are only TREE indexes created with NoSQL or NewSQL interface, or those automatically set for a primary key. HASH indexes or spatial indexes created with the NoSQL interface are not used
Using the SQL syntax hint “NoIndexScan”, a scan without using an index can be specified. Refer to “Hint phrases” for the details of a hint.
This section explains rules to select a composite index for scan processing when optimizing GridDB SQL.
The range that uses an index differs depending on the column or the operator specified in the search condition. The part from the beginning of a column that comprises a composite index to the end of consecutive column conditions joined by the AND condition is used as the composite index.
[Memo]
Example:
where col1 = 1 and col2 = 1 and col3 = 2 (with a composite index of col1, col2, col3)
where col1 = 1 and col2> 1 and col3 <2 (with a composite index of col1, col2, col3)
where col1 = 1 and col2 = 1 (with a composite index of col1, col2, col3)
where col1 = 1 and col3 = 2 (with a composite index of col1, col2, col3)
where col2 = 1 (with a composite index consisting of col1, col2, and col3)
where col1 = 1 and (col3 >= 0 and col3 < 10) and col2 = 1 (with a composite index consisting of col1, col2, and col3)
where col1 = 1 and (col3 = 0 or col3 = 1) and col2 = 1 (with a composite index consisting of col1, col2, and col3)
where (col1 = 0 or col1 = 1) and col2 = 1 (with a composite index consisting of col1, col2, and col3)
where col1 = 1 and (col3 >= 0 and col3 < 10) (with a composite index consisting of col1, col2, and col3)
where col1 = 1 and (col3 >= 0 and col3 < 10) and ABS(col2) = 1 (with a composite index consisting of col1, col2, and col3)
This section explains the rules for joining tables to optimize GridDB SQL.
For join processing without using indexes, the join order and the join operation method greatly influence the performance.
In join processing, the table accessed first is called the driving table, and the table accessed next and joined is called the internal table.
The order of joins (driving table and inner table) is determined by the following rules.
When joining two tables, a table which has a conditional expression which is equal to a constant becomes a drive table.
Example:
t1.a=t2.x AND t2.y=1
For joining three or more tables, the join order is determined by the strength of coupling or the strength of the search condition.
Tables with a strong degree of coupling are joined consecutively.
Example:
FROM A, B, C
Example:
FROM A, B, C WHERE A.x=C.z AND C.z=B.y
Example:
FROM A, B, C WHERE A.x>=C.z AND C.z>=B.y AND B.y=A.x
The tables with a stronger condition to narrow down the data, which is presumed to filter out fewer data, are joined earlier.
Example:
FROM A, B, C WHERE A.x=C.z AND C.z=B.y AND B.x=1
Example:
FROM A, B, C WHERE A.x=C.z AND C.z=B.y AND A.x IN (1, 2) AND B.x IN (1, 2, 3)
[Memo]
This section explains the rules for using indexes in join processing.
An index is used, when all five of the following rules are true:
The driving table has an equivalent condition.
Example:
t1.a=t2.x AND t2.y=1 ("a" is indexed, the driving table is t2 and the internal table is t1)
Example:
t1.a=t2.x AND t2.y>1 ("a" is indexed, the driving table is t2 and the internal table is t1)
The first join condition is an equivalent condition.
Example:
t1.a>t2.x AND t1.b=t2.y AND t2.z=1 ("a" is indexed, the driving table is t2 and the internal table is t1)
The column of the internal table, which has an equivalent condition, of the first join condition is indexed.
Example:
t1.a=t2.x AND t2.y=1 ("a" is not indexed, the driving table is t2 and the internal table is t1)
INNER JOIN
Not disabled by NoIndexJoin hint.
[Memo]
When the join syntax applies to the rules for using an index, an index is used to join. This section explains the rules to choose the index used to join.
Basically, the rules for using an index are the same as the rules for choosing an index to scan. All indexes set for columns are used in the order described.
Provided that, not all the indexes are used. An example of such cases is shown below.
When indexes are set on the columns used as search conditions, the first index is used.
Example:
t1.a=t2.x AND t1.b>t2.y AND t2.z=1 (a and b are indexed, the driving table is t2 and the internal table is t1)
In the OR condition (A OR B), when B is a false constant, the index of A is not used.
Example:
t1.a=t2.x AND (t1.b=t2.y OR false) AND t2.z=1 (a and b are indexed, the driving table is t2 and the internal table is t1)
There are following three types of join operation methods:
Join operation method | Description |
---|---|
Hash join | Create a temporary table in the memory by multiplying the join key of the driving table to the hash function, and then compare the table to the hash value of the internal table. |
Sort merge join | Sort the two tables to be joined by the join key, compare them in order and join them. |
Nested loop join | For the values of the join key of the driving table, find and join the data of the inner table that matched the join conditions. |
Hash join, sort merge join, and nested loop join are quick in this order.
Depending on the type of the first join condition, chose one of these operation methods.
The first condition | The operation method to choose |
---|---|
Equivalent condition | Hash join or sort merge join (Even if hash join is chosen, some of the operations may switch to sort merge join due to memory resource restriction.) |
Comarison condition | Sort merge join |
- | Nested loop join |
The first join condition of the join is selected in the following priority.
A simple column expression is prioritized to be the first join condition.
Example:
t1.a=abs(t2.x)+10 AND t1.b=t2.y
When there are two or more simple column expressions, the expression described earlier is selected as the first join condition.
Example:
t1.a>t2.x AND t1.b=t2.y
An execution plan can be controlled by specifying a hint in a query to modify the execution plan, without changing the SQL statement.
[Points to note]
The following table explains the hint function related terms.
Term | Description |
---|---|
Hint phrase | Information for controlling the execution plan |
Hints | A list of hint phrases Specified in the query to control an execution plan. Specified in the query to control an execution plan. |
Write the hint in the block comment of the query to control the execution plan. The block comment for the hint, can only be written immediately before or after the first SELECT (INSERT/UPDATE/DELETE) statement in SQL. To distinguish a hint comment from regular comments, the block comment for the hint begin with “/*+”.
The target to give a hint is specified by the object name or alias in parentheses. The targets are separated by either space, tab, or newline.
In the following example, the Leading hint clause specifies the table join order.
/*+
Leading(t3 t2 t1)
*/
SELECT *
FROM t1, t2, t3
ON t1.x = t2.y and t2.y = t3.z
ORDER BY t1.x
LIMIT 10;
[Memo]
The following table shows the available hint phrases.
Class | Operation | Description |
---|---|---|
Parallelism | MaxDegreeOfTaskInput(upper_limit) | Maximum number of inputs for one task. |
MaxDegreeOfExpansion(upper_limit) | Maximum number of expansion nodes of planning. | |
Scanning method | IndexScan(table) | Index scan is used if possible. |
NoIndexScan(table) | No index scan is used. | |
Joining method | IndexJoin(table table) | Using index join if possible. |
NoIndexJoin(table table) | No index join is used. | |
Table joining order | Leading(table table [table …]) | Join specified tables in the specified order. |
Leading(( table set table set )) | Join the first specified table set as the outer table and the second table set as the inner table indicates single table or table set) |
Table set = { table or ( table set table set )
This chapter shows details for each category of hint phrases.
Control parallelization processing.
Specify scanning method.
Specify which joining method to select for a table combination.
Specify in what order the tables are joined.
(1) Specify only joining order: Leading(table table [table …])
Specify the table names or aliases in order from the first table to be joined. In this method, using only Left-deep join orders.
[Example 1]
/*+ Leading(S R Q P) */
SELECT * FROM P,Q,R,S WHERE P.x = Q.x AND ...
(2) Specify joining order and direction: Leading(( table set table set ))
table set = { table or ( table set table set ) }
In case of specifying only joining order like (1), the joining direction (different for outer table or inner table) may be different from expectation. To fix the joining direction, use the following expression.
/*+ Leading((t1 (t2 t3))) */
SELECT ...
In this expression, parentheses can be nested. It joins the first specified table set as the outer table and the second set as the inner table.
[Example 2-1]
/*+ Leading(((P Q) R)) */
SELECT * FROM P,Q,R WHERE P.x = Q.x AND ...
[Example 2-2]
/*+ Leading((R (Q P))) */
SELECT * FROM P,Q,R WHERE P.x = Q.x AND ...
[Example 2-3]
/*+ Leading(((P Q) (R S))) */
SELECT * FROM P,Q,R,S WHERE P.x = Q.x AND ...
[Memo]
In the following cases, a syntax error occurs.
In the following case, a table specification error occurs:
[Memo]
The EXPLAIN ANALYZE statement of SQL indicates what operation and index were chosen by SQL optimization.
In SQL processing, SQL syntax is analyzed, optimized, and divided into processing units called “tasks”, such as join and sort scan, and then a plan (execution plan) is generated.
Tasks are executed on any one of the nodes which constitute a cluster, in parallel execution, exchanging data between these tasks.
[Memo]
Execution of the EXPLAIN ANALYZE sentence will output information, including the task plans, execution time, in JSON format for each task in a line.
The main items to be output are as follows.
Item | Description |
---|---|
id | Plan ID |
type | Type of processing |
inputList | List of plan IDs of the plans to be input |
profile/leadtime | Processing time |
profile/rows | Number of inputs |
profile/address | The address and port number of the node which executed the processing |
[Memo]
The types of processing are as follows.
Value of type | Description |
---|---|
GROUP | Grouping operation |
JOIN | Join operation |
LIMIT | Filter operation of the number of lines |
SCAN | Table scan operation |
SELECT | Selection operation (condition filter projection) |
SORT | Sort operation |
UNION | Combine and set operation |
INSERT, UPDATE, DELETE | Various operations to change the container |
DDL | DDL/DCL sentnce |
RESULT | Cointain the result |
A plan can be get using the operation tool gs_sh.
Example) To get the plan of a query “select * from table1, table2 where table1.value = 0 and table1.id = table2.id”
gs[public]> EXPLAIN ANALYZE select * from table1, table2 where table1.value=0 and table1.id=table2.id;
A search was executed. (11 ms)
gs[public]> getplantxt
Id Type Input Rows Lead time Actual time Node And more..
--------------------------------------------------------------------------------------------------------------------
0 SCAN - - 0 0 192.168.15.161:10001 table: {table1} INDEX SCAN
1 SCAN 0 0 2 2 192.168.15.161:10001 table: {table1, table2} INDEX SCAN JOIN_EQ_HASH
2 RESULT 1 0 0 0 192.168.15.161:20001
Execute the getplanjson subcommand with gs_sh to output the plan in JSON format.
[Notes]
Example of JSON format of a plan
{
"nodeList" : [ {
"cmdOptionFlag" : 65,
"id" : 0,
"indexInfoList" : [ 3, 3 ],
"inputList" : [ ],
"outputList" : [ {
"columnId" : 0,
"columnType" : "INTEGER",
:
:
}