如何建立合适的索引？

Incessant 2009-10-11 22:31:21 累计浏览 6,779 次

本机暂存

内容概览

这篇文章从一个典型的运维场景出发：当你在 statspack 报表中发现逻辑读、物理读极高的 SQL，分析过表统计信息且执行计划已走索引时，该如何进一步判断其是否真正优化？作者没有停留在表面指标，而是深入探讨了在“看似合理”的表象下，如何验证索引选择的效能与 SQL 语句执行质量之间的深层关联。

文章的核心在于提供一种系统性的诊断思路，而不仅仅是工具使用教程。它引导读者超越“是否走了索引”这个二元判断，去追问索引类型（比如是否是覆盖索引）、访问路径（如回表次数）、以及索引效率与数据分布的实际匹配度。通过对这些细节的剖析，文章旨在帮助读者建立一套从性能现象反推至设计根源的完整优化逻辑。

读完能感受到，真正的优化工作往往从那些“看起来已经优化过了”的地方开始。它教会我们如何拆解一个“标准答案”，并在实际业务场景中做出更精准、更有效的判断，这对于任何需要与数据库性能打交道的工程师来说，都是一次扎实的思维训练。

当你接手一个系统时，通常是从检查系统负载，cpu、内存使用率开始，查看statspack top5等待事件，逻辑读、物理读排名靠前的sql语句等等，然后进行初步的优化。而随着业务的深入了解，你开始从系统的角度去考虑据库设计，考虑应用实现的合理性，是否有更好的改进方案等。假设通过statspack报表找到了很耗资源的sql，表分析过，执行计划也是走索引，这种情况下怎么去判断 sql是优化的呢？

1.提取逻辑读排名靠前的sql

6,813,699 336 20,278.9 10.1 66.72 80.45 3039661161

module: java@admin1 (tns v1-v3)

select b.biz_source, count(*) as counts from tb_hanzgs_de

tail a, tb_business_info b where a.id = b.hanzgs_id

and a.status = :1 and a.deal_id = :2 and a.create_date

>= to_date(:3,\'yyyy-mm-dd hh24:mi:ss\') and a.create_date < to

_date(:4,\'yyyy-mm-dd hh24:mi:ss\') group by b.biz_source

2.查看执行计划

sql> explain plan for

2 select b.biz_source, count(*) as counts

3 from tb_hanzgs_detail a, tb_business_info b

4 where a.id = b.hanzgs_id

5 and a.status = :1

6 and a.deal_id = :2

7 and a.create_date >= to_date(:3, \'yyyy-mm-dd hh24:mi:ss\')

8 and a.create_date < to_date(:4, \'yyyy-mm-dd hh24:mi:ss\')

9 group by b.biz_source;

Explained.

SQL> @?/rdbms/admin/utlxpls

Plan hash value: 1387434542

-----------------------------------------------------------------------

-----------------------------------------------------------------------

0 | select statement | | 1 | 31 |215

1 | sort group by | | 1 | 31 |215

2 | filter | | | |

3 | nested loops | | 1 | 31 |199

4 | table access by index rowid| tb_hanzgs_detail | 1| 21 |198

5 | index range scan | ind_tb_hanzgs_create | 231| |397

6 | index range scan | ind_tb_business_info_biz | 1| 10 |1

-----------------------------------------------------------------------

索引定义

create index ind_tb_hanzgs_create on tb_hanzgs_detail (create_date, deal_id,status, id)tablespace tbs_tb_ind online compute statistics;

3.查看语句执行时间

sql>select b.biz_source, count(*) as counts

3 from tb_hanzgs_detail a, tb_business_info b

4 where a.id = b.hanzgs_id

5 and a.status = 1

6 and a.deal_id = 0

7 and a.create_date >= to_date(sysdate-10, \'yyyy-mm-dd hh24:mi:ss\')

8 and a.create_date < to_date(sysdate-5, \'yyyy-mm-dd hh24:mi:ss\')

9 group by b.biz_source;

biz_source counts

---------- ----------

102 712

501 7881

701 1465

3 rows selected.

elapsed: 00:00:17.03

sql> /

biz_source counts

---------- ----------

102 713

501 7882

701 1465

3 rows selected.

elapsed: 00:00:05.32

这个语句查询时间在5.3秒左右，对于查询频繁的oltp系统中，毫无疑问全表扫描的代价是最高的，按时间索引扫描数据效率也是很低的，毕竟一个时间段的数据也是不少的。考虑到上面sql正好使用时间列索引，如果status，rule_id列稀疏读很高的话，这些列建立索引性能应该会有很大的提高。

4.查看表数据分布

sql> select status, count(*) as counts

2 from tb_hanzgs_detail

3 where create_date >= to_date(sysdate - 50, \'yyyy-mm-dd hh24:mi:ss\')

4 and create_date < to_date(sysdate - 49, \'yyyy-mm-dd hh24:mi:ss\') + 1

5 group by status;

status counts

--------- ----------

0 2

1 286

2 3567

3 123477

根据随机抽取几天数据分布结构，这个表中97%以上数据的status都等于3，如果status为3的查询应该会走全表扫描，消耗大量资源，查询频繁的话DBA是不允许开发部署到生产系统的，数据库压力太大。而status不等于3的数据量很少很少，以status列来建立索引，性能应该会有很大的提高。分析完就可以尝试的进行优化了。

5.重新建立索引

create index ind_tb_hanzgs_de_sta on tb_hanzgs_detail (status, deal_id, create_date,id)tablespace tbs_tb_ind online compute statistics;

新执行计划

sql> explain plan for

2 select b.biz_source, count(*) as counts

3 from tb_hanzgs_detail a, tb_business_info b

4 where a.id = b.hanzgs_id

5 and a.status = :1

6 and a.deal_id = :2

7 and a.create_date >= to_date(:3, \'yyyy-mm-dd hh24:mi:ss\')

8 and a.create_date < to_date(:4, \'yyyy-mm-dd hh24:mi:ss\') + 1

9 group by b.biz_source;

explained.

elapsed: 00:00:00.01

14:11:48 sql> @?/rdbms/admin/utlxpls

------------------------------------------------------------------

------------------------------------------------------------------

0 | select statement | |1 |31 |19 |

1 | sort group by | |1 |31 |19 |

2 | filter | | | | |

3 | nested loops | |1 |31 | 3 |

4 | index range scan | ind_tb_hanzgs_de_sta |1 |21 | 4 |

5 | index range scan | ind_tb_business_info_biz |1 |10 | 1 |

------------------------------------------------------------------

6.重新执行该语句

sql>select b.biz_source, count(*) as counts

3 from tb_hanzgs_detail a, tb_business_info b

4 where a.id = b.hanzgs_id

5 and a.status = 1

6 and a.deal_id = 0

7 and a.create_date >= to_date(sysdate-10, \'yyyy-mm-dd hh24:mi:ss\')

8 and a.create_date < to_date(sysdate-5, \'yyyy-mm-dd hh24:mi:ss\')

9 group by b.biz_source;

biz_source counts

---------- ----------

102 713

501 7881

701 1465

3 rows selected.

elapsed: 00:00:00.24

sql> /

biz_source counts

---------- ----------

102 713

501 7882

701 1465

3 rows selected.

elapsed: 00:00:00.23

调整后的查询时间在0.2秒左右，速度提高了100倍左右，我只是简单的把索引列位置调换一下，性能就有了很大提高，statspack看不到这条语句了。总的来说，索引不是说create就可以了，通常需要考虑以下几点。

1.结合实际的应用。

2.考虑索引列的数据分布，如果distinct值很少且数据分布均匀的话，可能就不适合放在联合索引的最前面。

3.考虑索引的大小，在字段长度32的列和长度为7的列上建立索引大小肯定是不一样的，索引越大扫描的代价就越高。

4.考虑索引列冗余，可能你在索引中多冗余一个小字段，select就只走索引而不需要去扫描原表的数据。

5.考虑索引对其他sql的影响，是否其他的sql也可以使用这个索引。

6.考虑对是否可以对原有索引进行合并。

同分类推荐文章

使用deepseek进行Oracle恢复,引起重大故障（2026-06-22 10:56:00）
接手一个只差临门一脚的数据库恢复（2026-06-18 00:13:09）
我做了一个 AI 版的 StarRocks 升级风险扫描工具，直接帮我定位到一个风险（2026-06-15 01:00:00）

查看更多数据库文章 →

建议继续学习

一次SQL优化记录（累计阅读 6,068）
Mysql中的排序优化（累计阅读 5,668）
阿里巴巴国际站P4P引擎系统简介（累计阅读 5,342）
MySQL介绍和性能优化 (PPT/PDF) （累计阅读 4,755）
改变了对Mysql子查询的看法（累计阅读 3,978）
Oracle cluster使用场景分析（累计阅读 3,707）
说oracle优化之一（累计阅读 3,338）
mysql索引的一个技巧（累计阅读 3,262）
Virtual Indexes （累计阅读 3,245）
视频站收录浅析（累计阅读 2,845）