找到你要的答案

Q:Optimizing an Oracle group by/aggregate query

Q:通过聚集查询优化Oracle组

I have a query I would like to optimize. This is the query:

SELECT CONN.connNum, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
FROM INTER
INNER JOIN CONN ON (INTER.IDConn_FK = CONN.IDConn)
GROUP BY INTER.IDConn_FK, CONN.connNum;

These are the explain plan results:

------------------------------------------------------------------------------------------------                                                                                                                                                                                                             
| Id  | Operation                     | Name           | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                                                                                                                                             
------------------------------------------------------------------------------------------------                                                                                                                                                                                                             
|   0 | SELECT STATEMENT              |                |     3 |   171 |     7  (15)| 00:00:01 |                                                                                                                                                                                                             
|   1 |  HASH GROUP BY                |                |     3 |   171 |     7  (15)| 00:00:01 |                                                                                                                                                                                                             
|   2 |   NESTED LOOPS                |                |     3 |   171 |     6   (0)| 00:00:01 |                                                                                                                                                                                                             
|   3 |    NESTED LOOPS               |                |     3 |   171 |     6   (0)| 00:00:01 |                                                                                                                                                                                                             
|   4 |     TABLE ACCESS FULL         | INTER          |     3 |    78 |     3   (0)| 00:00:01 |                                                                                                                                                                                                             
|*  5 |     INDEX UNIQUE SCAN         | SYS_C002012172 |     1 |       |     0   (0)| 00:00:01 |                                                                                                                                                                                                             
|   6 |    TABLE ACCESS BY INDEX ROWID| CONN           |     1 |    31 |     1   (0)| 00:00:01 |                                                                                                                                                                                                             
------------------------------------------------------------------------------------------------  

I've tried using more specific SELECTs, but the results are the same (something like FROM (SELECT IDConn_FK, walkingDistanceMinutes FROM INTER) I etc). Can you please show me a way to get the cost down?

我有一个查询我想优化。这是查询:

SELECT CONN.connNum, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
FROM INTER
INNER JOIN CONN ON (INTER.IDConn_FK = CONN.IDConn)
GROUP BY INTER.IDConn_FK, CONN.connNum;

这些是解释计划的结果:

------------------------------------------------------------------------------------------------                                                                                                                                                                                                             
| Id  | Operation                     | Name           | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                                                                                                                                             
------------------------------------------------------------------------------------------------                                                                                                                                                                                                             
|   0 | SELECT STATEMENT              |                |     3 |   171 |     7  (15)| 00:00:01 |                                                                                                                                                                                                             
|   1 |  HASH GROUP BY                |                |     3 |   171 |     7  (15)| 00:00:01 |                                                                                                                                                                                                             
|   2 |   NESTED LOOPS                |                |     3 |   171 |     6   (0)| 00:00:01 |                                                                                                                                                                                                             
|   3 |    NESTED LOOPS               |                |     3 |   171 |     6   (0)| 00:00:01 |                                                                                                                                                                                                             
|   4 |     TABLE ACCESS FULL         | INTER          |     3 |    78 |     3   (0)| 00:00:01 |                                                                                                                                                                                                             
|*  5 |     INDEX UNIQUE SCAN         | SYS_C002012172 |     1 |       |     0   (0)| 00:00:01 |                                                                                                                                                                                                             
|   6 |    TABLE ACCESS BY INDEX ROWID| CONN           |     1 |    31 |     1   (0)| 00:00:01 |                                                                                                                                                                                                             
------------------------------------------------------------------------------------------------  

我试着用更具体的选择,但结果都是一样的(类似于(选择idconn_fk,walkingdistanceminutes国米)我等)。你能告诉我降低成本的方法吗?

answer1: 回答1:

It was very useful to know if IDConn_FK and connNum were unique on their table because this changes lots of things.

  1. If they're both unique on their tables, you wouldn't need to group results because there wouldn't be multiple occurrences of the same value for connNum. So, in this case, one optimizations would be to not group by because there is only a single value of walkingDistanceMinutes corresponding to each connNum. Removing an unneeded group by would be the right optimization here.

  2. If just connNum is unique on CONN, then one way to optimize this query may be to limit the size of the resources needed to sort the elements during the MIN evaluation. This can be done using a subquery that will also limit the number of rows involved in the join. Here you can use query #1

  3. If only IDConn_FK is unique then the query is fine as it is. Query #2 may help you a little, but not really much.

  4. If none of the two columns is unique, you can always try to limit the number of rows involved in the join through a subquery like for case #2, but you will also need to re-evaluate the MIN once more because you need it corresponding to connNum(that relies on table CONN). Don't think that grouping twice will be more expensive than doing it at once: this is a sort of divide-et-impera approach(separate a complex problem into more simple problems and the recombine their results together to get the solution for the complex problem). Here you could use query #2.

Query #1:

SELECT CONN.connNum, minimalWalkingDistance
FROM (
        select INTER.IDConn_FK as IDConn, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
        from INTER
        GROUP BY INTER.IDConn_FK
    ) inter
    JOIN CONN using (IDConn)

Query #2

SELECT CONN.connNum, MIN(INTER.minimalWalkingDistance) AS minimalWalkingDistance
FROM (
        select INTER.IDConn_FK as IDConn, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
        from INTER
        GROUP BY INTER.IDConn_FK
    ) inter
    JOIN CONN using (IDConn)
group by CONN.connNum

And last one more thing to know: don't always consider execution plan cost as God's word, there are many times where queries with high cost are more efficient than others with lower cost. Especially when there are a high number of joins and aggregations.

它是知道idconn_fk和connnum是独特的在他们的表因为这改变很多东西很有用。

  1. 如果他们都是独特的在他们的桌上,你不需要因为组的结果就不会有connnum多个相同价值的事件。所以,在这种情况下,一个优化是不组,因为只有一个值对应于每个connnum walkingdistanceminutes。删除不必要的组在这里会很合适的优化。

  2. 如果只是connnum是独一无二的CONN,再到优化这个查询的一种方法可以限制需要排序的元素的最小评价过程中资源的大小。这可以使用子查询,也将限制参与联接的列数。在这里你可以使用查询# 1

  3. 如果idconn_fk独特然后查询好了。查询# 2可以帮助你一点,但真的不多。

  4. 如果两列都是独特的,你可以试着限制通过查询喜欢案例# 2参与联接的行数,但你也需要重新评估分钟再次因为你需要相应的connnum(依赖于表CONN)。不要认为分组两次会比一次做更昂贵的:这是一种分而治之的方法(单独的一个复杂的问题分解为更简单的问题和重组他们的结果一起对复杂问题的解决方案)。在这里,你可以使用查询# 2。

查询# 1:

SELECT CONN.connNum, minimalWalkingDistance
FROM (
        select INTER.IDConn_FK as IDConn, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
        from INTER
        GROUP BY INTER.IDConn_FK
    ) inter
    JOIN CONN using (IDConn)

查询# 2

SELECT CONN.connNum, MIN(INTER.minimalWalkingDistance) AS minimalWalkingDistance
FROM (
        select INTER.IDConn_FK as IDConn, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
        from INTER
        GROUP BY INTER.IDConn_FK
    ) inter
    JOIN CONN using (IDConn)
group by CONN.connNum

最后一件事要知道:不要总是认为执行计划成本为上帝的话,有很多时候,高成本的查询比其他成本更低的效率更高。特别是当有大量的连接和聚集。

answer2: 回答2:

For your size of data, there is no real optimization possible. For larger data, Oracle should choose other execution paths. You might try this:

select c.connNum,
       (select min(i.walkingDistanceMinutes
        from inter i
        where i.IDConn_FK = c.idConn
      ) as minimalWalkingDistance
from conn c ;

I'm not 100% sure this is exactly the same query. I'm assuming that idConn is the primary key on the conn table.

对于您的数据大小,没有真正的优化可能。对于较大的数据,Oracle应该选择其他执行路径。你可以试试这个:

select c.connNum,
       (select min(i.walkingDistanceMinutes
        from inter i
        where i.IDConn_FK = c.idConn
      ) as minimalWalkingDistance
from conn c ;

我不是100%肯定这是完全相同的查询。我假设idconn在连接表的主键。

answer3: 回答3:

Create a unique index on Conn (IDConn, connNum).

This should remove the last live off the query plan as the index can satisfy all needed columns.

在Conn创建唯一索引(idconn,connnum)。

这将删除查询计划的最后一次活,因为该索引可以满足所有需要的列。

sql  oracle  optimization