According to Pareto principle’s 80/20 rule, we put 20% effort in article 1, SQL Performance (1), Optimized SELECT Query (A), to discuss the minimum actions we need to do for SQL performance tuning SELECT statements, and hopefully to get 80% performance improvement; while in the this article, we will discuss the 80% effort to get another 20% performance improvement.
This type of join creates a Cartesian Join, also called a Cartesian Product or CROSS JOIN.
Ina Cartesian Join, all possible combinations of the variables are created. In this example, if we had 1,000 customers with 1,000 total sales, the query would first generate 1,000,000 results, then filter forthe 1,000 records where CustomerID is correctly joined. This is an inefficient use of database resources, as the database has done 100x more work than required. Cartesian Joins are especially problematic in large-scale databases, because a Cartesian Join of two large tables could create billions or trillions of results.
To prevent creating a Cartesian Join, use INNER JOIN instead:
Tryto avoid writing a query using multiple joins that include outer joins, cross apply, outer apply and other complex subqueries.
Reason: When the joins are complex, the number of possible execution plans exponentially increases. This, in turn, will reduce the choices for the query optimizer to decide the join order and join type. Sometimes, the query optimizer is forced to use nested loop joins, irrespective of the performance consequences for queries with excessively complex cross apply or subqueries. Also, due to time and performance constraint, SQL Engine will try to generate the “good enough” execution plan versus trying to generate the best possible execution plan.
You can decompose a join by running multiple single-table queries instead of a multitable join, and then performing the join in the application. For example, instead of this single query:
You might run these queries:
Why do this? It looks wasteful at first glance, because you've increased the number of queries without getting anything in return. However, such restructuring can actually give significant performance advantages:
mysql
is already
cached, the application will skip the first query. If you find posts
with an ID of 123, 567, or 908 in the cache, you can remove them from
the IN()
list. The query cache might also benefit from this strategy. If only one of the tables changes frequently, decomposing a
join can reduce the number of cache invalidations.IN()
list instead of a join lets MySQL sort row IDs and retrieve rows more optimally than might be possible with a join.As a result, doings joins in the
application can be more efficient when you cache and reuse a lot of
datafrom earlier queries, you distribute data across multiple servers,
you
replace joins with IN()
lists, or a join refers to the same table multiple times.
Reason: Even though table variables are very fast and efficient in a lot of situations, the SQL Server engine sees it as a single row. Due to this reason, SQL will produce an execution plan that will perform horribly when used in joins.
When you have to join a large table and there are conditions on said table, you can increase database performance by transferring your data in a temp table, and then making a join on that. Your temp table will have fewer rows than the original (large) table, so the join will finish faster!
The decision isn’t always straightforward, but this example will giveyou a sense for situations in which you might want to use temp tables:
Imagine a customer table with millions of records. You have to make ajoin on a specific region. You can achieve this by using a SELECT INTO
statement and then joining with the temp table:
(Note: some SQL developers also avoid using SELECT INTO
to create temp tables, saying that this command locks the tempdb
database, disallowing other users from creating temp tables.
Fortunately, this is fixed in 7.0 and later.)
As an alternative to temp tables, you might consider using a subquery as a table:
All of these SQL snippets will return the same data. But with temp tables, we could, for example, create an index in the temp table to improve performance. There’s some good discussion here on the differences between temporary tables and subqueries.
c. Using a CTE[ref]Here is an example using of TWO CTE’s, it’s a simple example, but it shows how two CTE’s are defined, and then used in an INNER JOIN
A correlated subquery is one which uses values from the parent query. This kind of SQL query tends to run row-by-row,once for each row returned by the outer query, and thus decreases SQL query performance. New SQL developers are often caught structuring theirqueries in this way—because it’s usually the easy route.
Here’s an example of a correlated subquery:
In particular, the problem is that the inner query (SELECT CompanyName…
) is run for each row returned by the outer query (SELECT c.Name…
). But why go over the Company
again and again for every row processed by the outer query?
A more efficient SQL performance tuning technique would be to refactor the correlated subquery as a join:
In this case, we go over the Company
table just once, at the start, and JOIN it with the Customer
table. From then on, we can select the values we need (co.CompanyName
) more efficiently.
Multi-statement TVFs are more costly than inline TVFs.
Reason:SQL Server expands inline TVFs into the main query like it expands views, but evaluates multi-statement TVFs in a separate context from themain query and materializes the results of multi-statements into temporary work tables. The separate context and work table makes multi-statement TVFs costly.
In order to fine tune your SQL queries, you must avoid using LIKE pattern in the following manner:
Here, the database will not be able to use a suitable index if it exists because of % wildcard. The system starts by performing a full table scan and this takes a toll on its speed. Thus, the better way to write this query is:
This SQL optimization technique concerns the use of EXISTS()
. If you want to check if a record exists, use EXISTS()
COUNT()
COUNT()
scans the entire table, counting up all entries matching your condition, EXISTS()
will exit as soon as it sees the result it needs. Avoid using GROUP BY, ORDER BY, and DISTINCT whenever possible.
Reason:When using GROUP BY, ORDER BY, or DISTINCT, the SQL Server engine creates a work table and puts the data on the work table. After that, itorganizes this data in the work table as requested by the query, and then it returns the final result.
This is easier said than done depending on your permissions to make changes to the schema.
Reason:When joining or comparing two fields with different datatypes, SQL mustdo an on-the-fly conversion of the field before it can do a comparison,even if the fields are indexed. If mismatched datatypes are unavoidable, try to cast the larger datatype to the smaller datatype whenever possible.
Consider this query, which selects the number of red fruits in a forest.
This is easier said than done depending on your permissions to make changes to the schema.
The goal of an efficient query is to pull only the required records from the database. Per the SQL Order of Operations, HAVINGstatements are calculated after WHERE statements. If the intent is to filter a query based on conditions, a WHERE statement is more efficient.
For example, let’s assume 200 sales have been made in the year 2016, and we want to query for the number of sales per customer in 2016.
This query would pull 1,000 sales records from the Sales table, then filter for the 200 records generated in the year 2016, and finally countthe records in the dataset.
In comparison, WHERE clauses limit the number of records pulled:
This query would pull the 200 records from the year 2016, and then count the records in the dataset. The first step in the HAVING clause has been completely eliminated.
HAVING should only be used when filtering on an aggregated field. In the query above, we could additionally filter for customers with greater than 5 sales using a HAVING statement.
Consider this query, which selects the amount of posts on Facebook posted after new year's eve, or posted by a user named Mark.
An alternative way to look at this query can be to 'split' the OR condition and 'combine' it using a UNION clause. This alternative will allow you to index each of the conditions separately, so the database will use the indexes to search for the results and then combine the results with the UNION clause.
Consider this query, which selects all posts from Facebook and sorts them by the username in an ascending order, and then by the post date ina descending order.
Powered by Discuz! X3.2
© 2001-2013 Comsenz Inc.