跳跃表

三江小渡 » 三江小渡 2016-03-21 12:10:57 累计浏览 2,299 次

本机暂存

内容概览

这篇讲的是从Redis底层实现引出的跳跃表数据结构。作者从结构图入手，说明它本质是多层链表的组合，底层S0存储有序数据，上层作为索引加速查找。查找时从顶层向下逐层扫描，插入时则通过随机概率P决定新节点能“跳”到多高，从而在期望O(logn)时间内完成操作。

文章展示了一组实验数据，对比了不同P值（如1/2、1/e）对平均操作时间、列高和跳跃次数的影响，直观体现了参数选择的权衡。最后，作者引用了Redis作者antirez的观点，点出跳跃表相比平衡树的核心优势：内存占用可控、对范围查询（ZRANGE）友好、实现与调试更简单。文末也补充说明，这种简单高效的特性使其同样适合算法竞赛场景。

在看Redis底层实现的时候，看到一个数据结构“跳跃表”，随手学习了一下。大致结构如下图：
原图已失效
一个头指针数据控制着多个链表，分表用S0 - Sn 表示。有一堆数据节点用于链表节点的数据指针指向。最底层的S0链表内指向的数据节点内的值是有序的。

查找过程：从头指针列表控制的顶层链表开始一次查找，如果找到对应值就返回，否则找到一个不大于（或不小于，根据有序的方向决定）所查值的最大值的节点向下减一层继续查找，直到找到最底层S0也没找到既为空。时间复杂度期望为 O(logn)。

插入过程：先用查找算法找到相应的应该插入的有序链表的底层节点位置，新节点应该插入该节点后。其中关键是这里有一个随机化的跳跃算法，即插入的节点应该向上层链表跳跃几层。执行一个 i=0;while(p < random(1)) { i++;} 跳出循环得到的i即为应该跳跃的层次数。这样把相应层次和低于该层的所有链表内的相应位置加入一个指针指向该新插入节点。时间复杂度期望为 O(logn)。

删除操作根据数据结构特点相应的删除指针和值节点即可。时间复杂度期望为 O(logn)。

空间复杂度： O(n) （期望）
跳跃表高度： O(logn) （期望）

下边是一个网上的统计P值对各个操作的影响列表：

P	平均操作时间	平均列高	总结点数	每次查找跳跃次数（平均值）	每次插入跳跃次数（平均值）	每次删除跳跃次数（平均值）
2/3	0.0024690 ms	3.004	91233	39.878	41.604	41.566
1/2	0.0020180 ms	1.995	60683	27.807	29.947	29.072
1/e	0.0019870 ms	1.584	47570	27.332	28.238	28.452
1/4	0.0021720 ms	1.330	40478	28.726	29.472	29.664
1/8	0.0026880 ms	1.144	34420	35.147	35.821	36.007

关于跳跃表比二叉树、红黑树等其他有序数据结构的优势看一下redis作者的使用跳跃表理由就了解了：

There are a few reasons:
1) They are not very memory intensive. It's up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees.
2) A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.
3) They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.
About the Append Only durability & speed, I don't think it is a good idea to optimize Redis at cost of more code and more complexity for a use case that IMHO should be rare for the Redis target (fsync() at every command). Almost no one is using this feature even with ACID SQL databases, as the performance hint is big anyway.
About threads: our experience shows that Redis is mostly I/O bound. I'm using threads to serve things from Virtual Memory. The long term solution to exploit all the cores, assuming your link is so fast that you can saturate a single core, is running multiple instances of Redis (no locks, almost fully scalable linearly with number of cores), and using the "Redis Cluster" solution that I plan to develop in the future.

ps：其实也非常合竞赛使用，相比线段树红黑树平衡树等的学习成本、编码等简单方便非常多，调试等也会极大节省时间。推荐竞赛选手使用。

同分类推荐文章

对基本有序的序列排序算法（2026-06-11 17:46:49）
Four Levels Of Customer Understanding （2026-05-22 21:00:00）
除法的意义（2026-04-12 20:52:17）

查看更多算法文章 →

建议继续学习

redis源代码分析 - persistence （累计阅读 32,228）
python编程细节──遍历dict的两种方法比较（累计阅读 20,370）
无锁消息队列（累计阅读 14,273）
Redis消息队列的若干实现方式（累计阅读 12,085）
关于memcache分布式一致性hash （累计阅读 11,818）
基于Redis构建系统的经验和教训（累计阅读 10,521）
淘宝搜索：定向抓取网页技术漫谈（累计阅读 9,529）
浅谈redis数据库的键值设计（累计阅读 9,353）
PHP与递归Recursion （累计阅读 9,246）
【2014年版】异地购房提取北京公积金（累计阅读 9,147）