griddb.github.io

— Introduction —

Aim & configuration of this manual

This manual describes GridDB SQL tuning.

This document is intended for developers who develop systems using SQL with GridDB.

The contents of this manual are as follows.

For optimization rules and tuning, the distinctive features of GridDB are mainly descried.

— Tuning steps —

SQL tuning is performed in the following procedure, similar to the case of a general database system.

Details about STEP 1 to STEP 3 are explained below.

Check a slow query

Among the queries running on the system, identify slow queries that are taking a long time.

A slow query that takes a long execution time can be set to output information about the query and the execution time to the event log. Thereby, the bottleneck can be identified among the queries executed from an application.

The procedure to check the slow query is as follows.

  1. Output setting of a slow query
  1. Execute a query.
  1. Check for slow queries

Obtain the plan

Execute the slow query identified in STEP 1 and get the query plan (global plan).

Use the operation tool gs_sh to get a plan. Execute the query with EXPLAIN ANALYZE syntax and get the plan with the sub-command getplantxt.

(1) Execute the query in the “EXPLAIN ANALYZE” syntax.

gs[public]> sql EXPLAIN ANALYZE select * from table1, table2 where table1.value=1 and table1.id=table2.id ;
A search was executed.  (19 ms)

(2) Get the plan.

gs[public]> getplantxt
Id Type       Input Rows Lead time Actual time Node                 And more..
--------------------------------------------------------------------------------------------------------------------
 0 SCAN       -     -           30          30 192.168.15.161:10001 table: {table1}
 1   SCAN     0     3000        21          21 192.168.15.161:10001 table: {table1, table2} INDEX SCAN JOIN_EQ_HASH
 2     RESULT 1     14           0           0 192.168.15.161:20001

GridDB processes SQL as follows: A node parses queries and generate a plan. The plan consists of plans for each task, a unit of execution, which are executed by each node to which the task was assigned.

SQL plan and the flow of processing
SQL plan and the flow of processing

The sub-command getplantxt for showing a plan displays a plan for each task line by line. An output of a task becomes the input of the next task.

Using the execution example of “(2) Get the plan” the way to display the plan is explained specifically.

Tune the query

In the scan processing under a search condition in a WHERE clause and in the scan processing for joining a table, the performance may greatly change depending on whether the index is used or not. As for joining a table, the performance depends greatly on the join order. Note these points when tuning a query.

(1) Analyze a plan

Analyze a plan to check whether the index is used and the queries are executed as intended. See the SQL plan for the details of a plan.

(2) Tuning

Tune using the following methods depending on the problem identified from the analysis of a plan.

Example) A query for joining tables is explained as a tuning example.

Get a plan for a query which joins table1 and table2 with the value of the column value.

gs[public]> EXPLAIN ANALYZE select * from table1, table2 where table1.value=0 and table1.value=table2.value;
A search was executed.  (13 ms)
gs[public]> getplantxt
Id Type       Input Rows  Lead time Actual time Node                 And more..
-------------------------------------------------------------------------------------
 0 SCAN       -     -            20          20 192.168.15.161:10001 table: {table1}
 1 SCAN       -     -             9           9 192.168.15.161:10001 table: {table2}
 2   JOIN     0,1   10000,3000  891         891 192.168.15.161:20001 JOIN_EQ_HASH
 3     RESULT 2     32            2           2 192.168.15.161:20001

In this plan, table1 and table2 are scanned respectively, and join is processed without using an index.

Checking the index information of the table showed that the column value was not indexed, so an index is created.

Execute the same query to get a plan.

gs[public]> EXPLAIN ANALYZE select * from table1, table2 where table1.value=0 and table1.value=table2.value;
A search was executed.  (7 ms)
gs[public]> getplantxt
Id Type       Input Rows Lead time Actual time Node                 And more..
--------------------------------------------------------------------------------------------------------------------
 0 SCAN       -     -           20          20 192.168.15.161:10001 table: {table1}
 1   SCAN     0     10000       80          80 192.168.15.161:10001 table: {table1, table2} INDEX SCAN JOIN_EQ_HASH
 2     RESULT 1     32           3           3 192.168.15.161:20001

“INDEX SCAN” is displayed for plan ID2, and the plan has changed to use an index in join processing.

— Optimize SQL —

Scan using an index

In a scan processing to find data that matches the search condition of WHERE clause, the “index scan” using the table index is often faster than the “full scan” which accesses all the rows of the table.

The index scan is more effective when the search condition of a WHERE clause can narrow down the data by reducing the hit ratio to the number of rows of a table.

Index selection rules

This section explains rules to select an index for scan processing when optimizing GridDB SQL.

Rules are different depending on the operator or expression specified in the search condition.

AND

Basically, when indexes are set on the columns used as search conditions, the first index is used.

Example:

a>1 AND b=2 (both a and b are indexed)

Provided that, in exceptional cases, the first index is not used. An example of such cases is shown below.

If conditions for the same column overlap, merge the conditions before using the index

Example:

a>1 AND a<=4 AND a<=3 ("a" is indexed)

OR

In the case of OR, the index is used only when all the columns specified as search conditions are indexed.

Example:

a>1 OR b=2 (both a and b are indexed)

Example:

a>1 OR b=2 (b is indexed)

Expressions for a comparison operator

The index is used only when the expressions for a comparison operator consist of a single column value and a constant expression.

Example:

a> 10*1000-1 ("a" has an index)

Example:

a+1>10*1000-1 ("a" has an index)

Example:

a>b (both a and b are indexed)

IN, BETWEEN

IN and BETWEEN, expressions combining AND, OR, and a comparison operators, the rules mentioned above are applied to.

Example:

a IN (1,2)        → a=1 OR a=2     ("a" is indexed)
a BETWEEN 1 AND 2 → a>=1 AND a<=2   ("a" is indexed)

[Memo]

Composite index selection rules

This section explains rules to select a composite index for scan processing when optimizing GridDB SQL.

The range that uses an index differs depending on the column or the operator specified in the search condition. The part from the beginning of a column that comprises a composite index to the end of consecutive column conditions joined by the AND condition is used as the composite index.

[Memo]

Example:

where col1 = 1 and col2 = 1 and col3 = 2 (with a composite index of col1, col2, col3)

Join

This section explains the rules for joining tables to optimize GridDB SQL.

For join processing without using indexes, the join order and the join operation method greatly influence the performance.

Join order selection rules

In join processing, the table accessed first is called the driving table, and the table accessed next and joined is called the internal table.

The order of joins (driving table and inner table) is determined by the following rules.

When joining two tables, a table which has a conditional expression which is equal to a constant becomes a drive table.

Example:

t1.a=t2.x AND t2.y=1

For joining three or more tables, the join order is determined by the strength of coupling or the strength of the search condition.

[Memo]

Rules for using an index

This section explains the rules for using indexes in join processing.

An index is used, when all five of the following rules are true:

[Memo]

Index selection rules

When the join syntax applies to the rules for using an index, an index is used to join. This section explains the rules to choose the index used to join.

Basically, the rules for using an index are the same as the rules for choosing an index to scan. All indexes set for columns are used in the order described.

Provided that, not all the indexes are used. An example of such cases is shown below.

When indexes are set on the columns used as search conditions, the first index is used.

Example:

  t1.a=t2.x AND t1.b>t2.y AND t2.z=1 (a and b are indexed, the driving table is t2 and the internal table is t1)

In the OR condition (A OR B), when B is a false constant, the index of A is not used.

Example:

  t1.a=t2.x AND (t1.b=t2.y OR false) AND t2.z=1 (a and b are indexed, the driving table is t2 and the internal table is t1)

Selection rules for join operation methods

There are following three types of join operation methods:

Join operation method Description
Hash join Create a temporary table in the memory by multiplying the join key of the driving table to the hash function, and then compare the table to the hash value of the internal table.
Sort merge join Sort the two tables to be joined by the join key, compare them in order and join them.
Nested loop join For the values of the join key of the driving table, find and join the data of the inner table that matched the join conditions.

Hash join, sort merge join, and nested loop join are quick in this order.

Depending on the type of the first join condition, chose one of these operation methods.

The first condition The operation method to choose
Equivalent condition Hash join or sort merge join
(Even if hash join is chosen, some of the operations may switch to sort merge join due to memory resource restriction.)
Comarison condition Sort merge join
- Nested loop join

The first join condition of the join is selected in the following priority.

Hint phrase

An execution plan can be controlled by specifying a hint in a query to modify the execution plan, without changing the SQL statement.

[Points to note]

Terms

The following table explains the hint function related terms.

Term Description
Hint phrase Information for controlling the execution plan
Hints A list of hint phrases Specified in the query to control an execution plan. Specified in the query to control an execution plan.

Specifying hints

Write the hint in the block comment of the query to control the execution plan. The block comment for the hint, can only be written immediately before or after the first SELECT (INSERT/UPDATE/DELETE) statement in SQL. To distinguish a hint comment from regular comments, the block comment for the hint begin with “/*+”.

The target to give a hint is specified by the object name or alias in parentheses. The targets are separated by either space, tab, or newline.

In the following example, the Leading hint clause specifies the table join order.

/*+
Leading(t3 t2 t1)
 */
SELECT *
  FROM t1, t2, t3
    ON t1.x = t2.y and t2.y = t3.z
  ORDER BY t1.x
  LIMIT 10;

[Memo]

List of hint phrases

The following table shows the available hint phrases.

Class Operation Description
Parallelism MaxDegreeOfTaskInput(upper_limit) Maximum number of inputs for one task.
  MaxDegreeOfExpansion(upper_limit) Maximum number of expansion nodes of planning.
Scanning method IndexScan(table) Index scan is used if possible.
  NoIndexScan(table) No index scan is used.
Joining method IndexJoin(table table) Using index join if possible.
  NoIndexJoin(table table) No index join is used.
Table joining order Leading(table table [table …]) Join specified tables in the specified order.
  Leading(( table set table set )) Join the first specified table set as the outer table and
the second table set as the inner table indicates single table or table set)

Table set = { table or ( table set table set )

Details of hint phrases

This chapter shows details for each category of hint phrases.

Parallelism

Control parallelization processing.

Scanning method

Specify scanning method.

Joining method

Specify which joining method to select for a table combination.

Table joining order

Specify in what order the tables are joined.

(1) Specify only joining order: Leading(table table [table …])

Specify the table names or aliases in order from the first table to be joined. In this method, using only Left-deep join orders.

[Example 1]

/*+ Leading(S R Q P) */
SELECT * FROM P,Q,R,S WHERE P.x = Q.x AND ...
Table joining order (Example 1)
Table joining order (Example 1)

(2) Specify joining order and direction: Leading(( table set table set ))

table set = { table or ( table set table set ) }

In case of specifying only joining order like (1), the joining direction (different for outer table or inner table) may be different from expectation. To fix the joining direction, use the following expression.

/*+ Leading((t1 (t2 t3))) */
SELECT ...

In this expression, parentheses can be nested. It joins the first specified table set as the outer table and the second set as the inner table.

[Example 2-1]

/*+ Leading(((P Q) R)) */
SELECT * FROM P,Q,R WHERE P.x = Q.x AND ...
Table joining order (Example 2-1)
Table joining order (Example 2-1)

[Example 2-2]

/*+ Leading((R (Q P))) */
SELECT * FROM P,Q,R WHERE P.x = Q.x AND ...
Table joining order (Example 2-2)
Table joining order (Example 2-2)

[Example 2-3]

/*+ Leading(((P Q) (R S))) */
SELECT * FROM P,Q,R,S WHERE P.x = Q.x AND ...
Table joining order (Example 2-3)
Table joining order (Example 2-3)

[Memo]

Error handling

In the following cases, a syntax error occurs.

In the following case, a table specification error occurs:

[Memo]

— SQL plan (execution plan) —

The EXPLAIN ANALYZE statement of SQL indicates what operation and index were chosen by SQL optimization.

In SQL processing, SQL syntax is analyzed, optimized, and divided into processing units called “tasks”, such as join and sort scan, and then a plan (execution plan) is generated.

Tasks are executed on any one of the nodes which constitute a cluster, in parallel execution, exchanging data between these tasks.

[Memo]

Execution of the EXPLAIN ANALYZE sentence will output information, including the task plans, execution time, in JSON format for each task in a line.

The main items to be output are as follows.

Item Description
id Plan ID
type Type of processing
inputList List of plan IDs of the plans to be input
profile/leadtime Processing time
profile/rows Number of inputs
profile/address The address and port number of the node which executed the processing

[Memo]

The types of processing are as follows.

Value of type Description
GROUP Grouping operation
JOIN Join operation
LIMIT Filter operation of the number of lines
SCAN Table scan operation
SELECT Selection operation (condition filter projection)
SORT Sort operation
UNION Combine and set operation
INSERT, UPDATE, DELETE Various operations to change the container
DDL DDL/DCL sentnce
RESULT Cointain the result

A plan can be get using the operation tool gs_sh.

Example) To get the plan of a query “select * from table1, table2 where table1.value = 0 and table1.id = table2.id”

gs[public]> EXPLAIN ANALYZE select * from table1, table2 where table1.value=0 and table1.id=table2.id;
A search was executed.  (11 ms)
gs[public]> getplantxt
Id Type       Input Rows Lead time Actual time Node                 And more..
--------------------------------------------------------------------------------------------------------------------
 0 SCAN       -     -            0           0 192.168.15.161:10001 table: {table1} INDEX SCAN
 1   SCAN     0     0            2           2 192.168.15.161:10001 table: {table1, table2} INDEX SCAN JOIN_EQ_HASH
 2     RESULT 1     0            0           0 192.168.15.161:20001

Execute the getplanjson subcommand with gs_sh to output the plan in JSON format.

[Notes]

Example of JSON format of a plan

{
  "nodeList" : [ {
    "cmdOptionFlag" : 65,
    "id" : 0,
    "indexInfoList" : [ 3, 3 ],
    "inputList" : [ ],
    "outputList" : [ {
      "columnId" : 0,
      "columnType" : "INTEGER",
          :
          :
}