数据结构考试复习范围一览

一个简单的中英知识点预览表,祝数据结构考试大捷😎!

Chapter 1 Programming: A General Overview

理解concept of Data Structure and algorithm, ADT


1. Data Structure 概念

  • Definition:
    A data structure is a particular way of organizing, managing, and storing data for efficient access and modification.
    数据结构是一种特定的组织、管理和存储数据的方式,用于高效地访问和修改数据。
  • Examples:
    • Arrays (数组)
    • Linked Lists (链表)
    • Stacks (栈)
    • Queues (队列)
    • Trees (树)
    • Graphs (图)

2. Algorithm 概念

  • Definition:
    An algorithm is a step-by-step procedure or formula for solving a problem.
    算法是一种解决问题的逐步方法或公式。
  • Characteristics (算法的特点):
    • Finiteness (有限性): The algorithm must terminate after a finite number of steps.
      算法在有限步骤后必须终止。
    • Definiteness (确切性): Each step must be precisely defined.
      每一步都必须明确定义。
    • Input (输入): The algorithm accepts zero or more inputs.
      算法可以接受零个或多个输入。
    • Output (输出): Produces at least one output.
      至少产生一个输出结果。
    • Effectiveness (有效性): All steps must be basic enough to be executed.
      每个步骤都必须足够简单,以确保可执行。

3. Abstract Data Type (ADT) 抽象数据类型

  • Definition:
    An ADT is a model for data types where data and operations are defined without specifying how the operations will be implemented.
    抽象数据类型是一种数据模型,其中数据和操作的定义不依赖于具体实现方式。
  • Components (组成部分):
    • Data (数据): The information stored in the ADT.
      ADT 中存储的信息。
    • Operations (操作): Functions that can be performed on the data.
      在数据上可以执行的操作。

4. Relationship between Data Structures and Algorithms 数据结构与算法的关系

  • Data structures focus on the efficient storage and organization of data.
    数据结构关注数据的高效存储和组织。
  • Algorithms define the steps to solve computational problems using data structures.
    算法定义了解决计算问题的步骤,通常依赖于数据结构。
  • Choosing the right data structure directly affects algorithm performance.
    选择合适的数据结构会直接影响算法性能。

5. Common Data Structures and Their Applications 常见数据结构及其应用

  • Array (数组):
    • Static and random access data structure.
      静态、随机访问的数据结构。
    • Example: Storing a list of elements.
      示例:存储元素列表。
  • Linked List (链表):
    • Dynamic and sequential access.
      动态、顺序访问。
    • Example: Efficient insertion and deletion.
      示例:高效的插入与删除。
  • Stack (栈):
    • LIFO (Last In, First Out) structure.
      后进先出的数据结构。
    • Example: Function call management.
      示例:函数调用管理。
  • Queue (队列):
    • FIFO (First In, First Out) structure.
      先进先出的数据结构。
    • Example: Task scheduling.
      示例:任务调度。

6. Algorithm Complexity 算法复杂度

  • Time Complexity (时间复杂度):
    Measures the amount of time an algorithm takes to run as a function of the input size.
    衡量算法随输入规模增长的运行时间。
  • Space Complexity (空间复杂度):
    Measures the amount of memory an algorithm uses during execution.
    衡量算法在运行期间使用的内存量。
  • Common Notations (常见表示法):
    • O(1): Constant time. 常数时间。
    • O(log n): Logarithmic time. 对数时间。
    • O(n): Linear time. 线性时间。
    • O(n²): Quadratic time. 平方时间。

Chapter 2 Algorithm Analysis

理解:growth rate, typical growth rate equation(n!, 2n, nlogn, n2,n, etc.); upper & lower bounds and relative rules; Best, worst and Average cases, Amortized Analysis

掌握:Asymptotic analysis

1. Growth Rate (增长率)

  • Definition (定义):
    Growth rate describes how the running time or space requirements of an algorithm increase as the input size grows.
    增长率描述了算法的运行时间或空间需求随输入规模增长的变化情况。

  • Typical Growth Rate Equations (典型增长率方程):

    • Constant (常数): $ O(1) $
      示例:访问数组的单个元素。
    • Logarithmic (对数): $ O(n) $
      示例:二分查找(Binary Search)。
    • Linear (线性): $ O(n) $
      示例:简单遍历数组。
    • Linearithmic (线性对数): $ O(n n) $
      示例:高效排序算法(如快速排序、归并排序)。
    • Quadratic (平方): $ O(n^2) $
      示例:冒泡排序(Bubble Sort)。
    • Exponential (指数): $ O(2^n) $
      示例:求解递归组合问题。
    • Factorial (阶乘): $ O(n!) $
      示例:全排列(Permutation)问题。

2. Upper & Lower Bounds (上界与下界)

  • Upper Bound (上界):
    • An upper bound describes the worst-case scenario for the running time of an algorithm.
      上界描述了算法运行时间的最坏情况。
    • 表示法:$ O(f(n)) $。
  • Lower Bound (下界):
    • A lower bound describes the best-case scenario or the minimum resources required.
      下界描述了算法运行时间的最优情况或最低资源需求。
    • 表示法:$ (f(n)) $。
  • Tight Bound (紧确界):
    • When both the upper and lower bounds match, we describe the complexity as $ (f(n)) $.
      当上界和下界相等时,称算法的复杂度为 $ (f(n)) $。

3. Best, Worst, and Average Cases (最优、最坏和平均情况)

  • Best Case (最优情况):
    • The scenario where the algorithm performs the minimum number of operations.
      算法执行最少操作次数的情况。
    • 示例:查找元素在数组开头。
  • Worst Case (最坏情况):
    • The scenario where the algorithm performs the maximum number of operations.
      算法执行最多操作次数的情况。
    • 示例:查找元素在数组末尾或不存在。
  • Average Case (平均情况):
    • The expected running time over all possible inputs of a given size.
      针对所有可能输入的平均运行时间。
    • 平均情况通常通过概率分析计算。

4. Amortized Analysis (均摊分析)

  • Definition (定义):
    Analyzing the average time per operation over a sequence of operations, even if a single operation might be costly.
    通过分析一系列操作的平均时间,得出单个操作的均摊成本,即使某些操作可能较为耗时。
  • Example (示例):
    动态数组的扩展:
    • 扩展操作的单次代价很高,但均摊后每次插入的代价为 $ O(1) $。

5. Asymptotic Analysis (渐进分析)

  • Definition (定义):
    Asymptotic analysis evaluates an algorithm's performance by describing its running time or space usage as a function of input size $ n $, ignoring constant factors.
    渐进分析评估算法性能,通过描述其运行时间或空间随输入规模 $ n $ 的变化情况,忽略常数因子。
  • Key Notations (关键表示法):
    • $ O(f(n)) $: Big-O, upper bound (上界)。
    • $ (f(n)) $: Omega, lower bound (下界)。
    • $ (f(n)) $: Theta, tight bound (紧确界)。

6. Rules for Asymptotic Analysis (渐进分析规则)

  • Simplify Constants (忽略常数):
    $ O(2n) O(n) $。
  • Keep Dominant Terms (保留主项):
    $ O(n^2 + n) O(n^2) $。
  • Nested Loops (嵌套循环):
    • 外层与内层循环次数相乘得到复杂度。
    • 示例:双重循环 $ O(n^2) $。
  • Divide and Conquer (分治法):
    • 使用递归公式计算复杂度,例如 $ T(n) = 2T(n/2) + O(n) $。
    • 示例:归并排序 $ O(n n) $。

7. Practical Example (实际示例)

  • Binary Search (二分查找):
    • Best Case: $ O(1) $(目标在中间)。
    • Worst Case: $ O(n) $(递归或迭代到边界)。
    • Average Case: $ O(n) $。
  • Bubble Sort (插入排序):
    • Best Case: $ O(n) $(数组已排序)。
    • Worst Case: $ O(n^2) $。
    • Average Case: $ O(n^2) $。

Chapter 3 Lists, stacks and Queues

掌握: Lists, stacks and Queues (logical and storage structures, implementations, and algorithm complexity)

1. Lists (线性表)

1.1 Logical Structure (逻辑结构)

  • Definition (定义):
    A list is a sequence of elements where order matters, and elements may repeat.
    线性表是一组有序元素的集合,元素可以重复。
  • Types (类型):
    • Singly Linked List (单向链表): Each node points to the next node.
      每个节点指向下一个节点。
    • Doubly Linked List (双向链表): Each node points to both the next and previous nodes.
      每个节点同时指向下一个和上一个节点。
    • Array List (数组实现的线性表): Elements are stored in contiguous memory locations.
      元素存储在连续的内存位置中。

1.2 Storage Structures (存储结构)

  • Array-based List (基于数组的实现):
    • Sequential allocation of memory.
      内存顺序分配。
    • Suitable for random access but expensive for insertions and deletions.
      适合随机访问,但插入和删除操作代价较高。
  • Linked List (基于链表的实现):
    • Dynamic memory allocation with nodes containing data and pointers.
      动态分配内存,每个节点包含数据和指针。
    • Efficient for insertions and deletions but slower for random access.
      插入和删除操作高效,但随机访问较慢。

1.3 Algorithm Complexity (算法复杂度)

Operation Array List $ O $ Linked List $ O $
Access (访问) $ O(1) $ $ O(n) $
Search (查找) $ O(n) $ $ O(n) $
Insert (插入) $ O(n) $ $ O(1) $ (at head)
Delete (删除) $ O(n) $ $ O(1) $ (at head)

2. Stacks (栈)

2.1 Logical Structure (逻辑结构)

  • Definition (定义):
    A stack is a linear data structure that follows the LIFO (Last In, First Out) principle.
    栈是一种遵循后进先出原则的线性数据结构。
  • Common Operations (常见操作):
    • Push (入栈): Add an element to the top of the stack.
      将元素添加到栈顶。
    • Pop (出栈): Remove the top element from the stack.
      移除栈顶元素。
    • Peek (查看栈顶): View the top element without removing it.
      查看但不移除栈顶元素。

2.2 Storage Structures (存储结构)

  • Array-based Stack (基于数组的栈):
    • Fixed size, efficient memory usage, but limited capacity.
      固定大小,内存使用高效,但容量有限。
  • Linked List-based Stack (基于链表的栈):
    • Dynamic size, supports efficient operations, but uses extra memory for pointers.
      动态大小,支持高效操作,但需额外内存存储指针。

2.3 Algorithm Complexity (算法复杂度)

Operation Complexity $ O $
Push (入栈) $ O(1) $
Pop (出栈) $ O(1) $
Peek (栈顶查看) $ O(1) $

3. Queues (队列)

3.1 Logical Structure (逻辑结构)

  • Definition (定义):
    A queue is a linear data structure that follows the FIFO (First In, First Out) principle.
    队列是一种遵循先进先出原则的线性数据结构。
  • Variants (变体):
    • Simple Queue (普通队列): Standard FIFO behavior.
      标准先进先出队列。
    • Circular Queue (循环队列): The last position connects back to the first, making efficient use of memory.
      最后一个位置与第一个位置相连,提高内存利用率。
    • Priority Queue (优先队列): Elements are dequeued based on priority instead of arrival time.
      元素根据优先级出队,而非到达时间。

3.2 Storage Structures (存储结构)

  • Array-based Queue (基于数组的队列):
    • Fixed size, efficient for random access but limited by capacity.
      固定大小,随机访问高效,但容量有限。
  • Linked List-based Queue (基于链表的队列):
    • Dynamic size, allows efficient insertions and deletions.
      动态大小,支持高效插入和删除。

3.3 Algorithm Complexity (算法复杂度)

Operation Complexity $ O $
Enqueue (入队) $ O(1) $
Dequeue (出队) $ O(1) $
Peek (队头查看) $ O(1) $

4. Comparison Summary (对比总结)

Feature (特性) List (线性表) Stack (栈) Queue (队列)
Access Order (访问顺序) Random (随机) LIFO (后进先出) FIFO (先进先出)
Insert/Delete Position Anywhere (任意位置) Top (栈顶) Front/Back (队首/队尾)
Common Uses (常见用途) Data storage, traversal Undo operations (撤销操作) Task scheduling (任务调度)

Chapter 4 Trees

理解 definition and properties of tree, Implementation of Trees, Balance trees, complete binary trees, full binary trees, B-trees, and up-tree(parent-point based Implementation)

掌握Binary tree: Binary tree Traversals (preorder, postorder, inorder), implementations and their space requirements; BST(construction, search, insert, delete, and algorithm complexity); AVL Trees(AVL property, re-balancing operations); Splay Trees(Splay property, splaying operations)

1. Definition and Properties of Tree (树的定义与性质)

1.1 Definition (定义)

  • Tree (树):
    A tree is a hierarchical data structure consisting of nodes, with one node designated as the root and other nodes forming subtrees connected by edges.
    树是一种分层数据结构,由节点组成,其中一个节点为根节点,其余节点形成子树,通过边连接。

1.2 Properties (性质)

  • Hierarchy (层级性): A tree is an acyclic connected graph.
    树是无环连通图。
  • Root (根节点): The topmost node in the tree.
    树的最顶端节点。
  • Parent & Child (父节点与子节点): A node connected above another is the parent, while the one below is the child.
    上层连接的节点为父节点,下层的为子节点。
  • Leaf (叶节点): Nodes without children.
    没有子节点的节点称为叶节点。
  • Depth (深度): The number of edges from the root to a node.
    从根节点到某节点的边数。
  • Height (高度): The maximum depth of any node in the tree.
    树中任意节点的最大深度。
  • Degree (度): The number of children a node has.
    一个节点的子节点数量。

2. Tree Types and Implementations (树的类型与实现)

2.1 Balanced Tree (平衡树)

  • A tree where the height difference between left and right subtrees is minimized for all nodes.
    平衡树是指所有节点的左右子树高度差最小。
  • Example: AVL Tree, Red-Black Tree.
    示例:AVL 树,红黑树。

2.2 Complete Binary Tree (完全二叉树)

  • A binary tree where all levels are fully filled except possibly the last, which is filled from left to right.
    所有层次的节点都被填满,最后一层从左到右排列。

2.3 Full Binary Tree (满二叉树)

  • A binary tree where every node has 0 or 2 children.
    每个节点要么没有子节点,要么有两个子节点。

2.4 B-Trees (B 树)

  • A self-balancing search tree where nodes can have more than two children, widely used in databases and file systems.
    一种自平衡的搜索树,节点可以有多个子节点,广泛用于数据库和文件系统。

2.5 Up-Tree (基于父指针的实现)

  • Nodes store pointers to their parent nodes, typically used in union-find operations.
    节点存储指向父节点的指针,常用于并查集操作。

3. Binary Tree (二叉树)

3.1 Binary Tree Traversals (二叉树遍历)

  • Preorder (先序遍历): Visit root, then left subtree, and right subtree.
    访问顺序:根 -> 左子树 -> 右子树。
  • Inorder (中序遍历): Visit left subtree, root, then right subtree.
    访问顺序:左子树 -> 根 -> 右子树。
  • Postorder (后序遍历): Visit left subtree, right subtree, then root.
    访问顺序:左子树 -> 右子树 -> 根。
  • Space Requirements (空间需求): $ O(h) $, where $ h $ is the height of the tree.

3.2 Binary Search Tree (BST, 二叉搜索树)

Properties (性质)

  • Left subtree contains values smaller than the root.
    左子树包含的值小于根节点。
  • Right subtree contains values greater than the root.
    右子树包含的值大于根节点。

Operations and Algorithm Complexity (操作与算法复杂度)

Operation Complexity $ O $
Search (查找) $ O(h) $
Insert (插入) $ O(h) $
Delete (删除) $ O(h) $
  • Construction (构造): Insert nodes iteratively or recursively.
    通过迭代或递归插入节点。
  • Search (查找): Traverse left or right subtree based on the target value.
    根据目标值遍历左或右子树。
  • Insert (插入): Place the value in the correct position based on BST rules.
    根据二叉搜索树规则将值插入正确位置。
  • Delete (删除):
    • No child: Remove the node.
      无子节点:直接移除节点。
    • One child: Replace the node with its child.
      一个子节点:用其子节点替代。
    • Two children: Replace the node with its inorder successor or predecessor.
      两个子节点:用中序后继或前驱替代节点。

3.3 AVL Trees (AVL 树)

AVL Property (AVL 性质)

  • For every node, the height difference between left and right subtrees is at most 1.
    每个节点的左右子树高度差不超过 1。

Rebalancing Operations (重新平衡操作)

  • Rotations (旋转操作):
    • Single Rotation (单旋转): Left or right rotation to balance the tree.
      左旋或右旋以平衡树结构。
    • Double Rotation (双旋转): A combination of left and right rotations.
      左旋和右旋的组合。

Algorithm Complexity (算法复杂度)

Operation Complexity $ O $
Search (查找) $ O(n) $
Insert (插入) $ O(n) $
Delete (删除) $ O(n) $

3.4 Splay Trees (伸展树)

Splay Property (伸展性质)

  • Frequently accessed elements are moved closer to the root for faster future access.
    经常访问的元素被移动到根节点附近以提高后续访问效率。

Splaying Operations (伸展操作)

  • Zig (单旋): For a node with no grandparent, rotate once.
    对于没有祖父节点的节点,单次旋转。
  • Zig-Zig (双旋同方向): Rotate twice in the same direction.
    同一方向旋转两次。
  • Zig-Zag (双旋相反方向): Rotate twice in opposite directions.
    不同方向旋转两次。

Algorithm Complexity (算法复杂度)

Operation Amortized Complexity $ O $
Search (查找) $ O(n) $
Insert (插入) $ O(n) $
Delete (删除) $ O(n) $

Chapter 5 Hashing

理解: Concepts of searching, Binary search

掌握: Hash table, hash function, collision and collision resolution policy(opening hashing- separate chaining and closed hashing), probe function (linear probing, pseudo random probing, quadratic probing, double hashing)


1. Concepts of Searching (搜索的概念)

Definition (定义)

  • Searching is the process of locating a specific element in a collection of data.
    搜索是指在数据集合中查找特定元素的过程。

Types of Searching Methods (搜索方法类型)

  • Linear Search (线性搜索):
    Sequentially checks each element until the target is found.
    按顺序逐个检查每个元素,直到找到目标元素。
    • Time Complexity (时间复杂度): $ O(n) $
  • Binary Search (二分搜索):
    Efficiently searches a sorted array by repeatedly dividing the search interval in half.
    在排序数组中,通过反复将搜索区间减半进行高效查找。
    • Time Complexity: $ O(n) $
    • Requires the data to be sorted.
      需要数据已排序。

2. Hash Table (哈希表)

Definition (定义)

  • A hash table is a data structure that maps keys to values for efficient lookups, insertions, and deletions.
    哈希表是一种将键映射到值的数据结构,支持高效的查找、插入和删除操作。

Key Components (关键组成部分)

  • Hash Function (哈希函数):
    A function that converts a key into an index within the hash table.
    将键转换为哈希表中索引的函数。
  • Bucket (桶):
    A storage location in the hash table where elements are stored.
    哈希表中存储元素的位置。

3. Hash Function (哈希函数)

Definition (定义)

  • A function that takes input (key) and returns an integer (hash value) corresponding to an index in the hash table.
    哈希函数接收输入(键),返回一个整数(哈希值),作为哈希表的索引。

Desirable Properties (理想性质)

  • Uniformity (均匀性): Distribute keys uniformly across the hash table.
    将键均匀分布在哈希表中。
  • Determinism (确定性): The same key always maps to the same hash value.
    相同的键总是映射到相同的哈希值。
  • Efficiency (高效性): Compute hash values quickly.
    快速计算哈希值。

4. Collisions and Collision Resolution (冲突及其解决方法)

Definition (定义)

  • A collision occurs when two different keys produce the same hash value.
    冲突是指两个不同的键生成相同的哈希值的情况。

Collision Resolution Policies (冲突解决策略)

#### 4.1 Open Hashing (开放哈希法) - Separate Chaining (分离链接法)
- Use a linked list or similar structure at each bucket to store all keys with the same hash value.
每个桶使用链表或类似结构存储具有相同哈希值的所有键。
- Advantages (优点):
- Handles high load factors gracefully.
可较好处理较高负载因子。
- Disadvantages (缺点):
- Requires extra memory for linked structures.
需要额外的内存存储链表。

#### 4.2 Closed Hashing (封闭哈希法) - Probing (探测法)
- Store all elements directly in the hash table using probing techniques.
通过探测技术将所有元素直接存储在哈希表中。
- Probing Techniques (探测方法):
- Linear Probing (线性探测):
Increment index sequentially until an empty slot is found.
逐个递增索引直到找到空槽。
- Advantage: Simple to implement.
优点:实现简单。
- Disadvantage: May lead to clustering.
缺点:可能导致“堆积”。
- Quadratic Probing (二次探测):
Probe slots at quadratic intervals ($ i^2 $, $ 2^2 $, etc.).
以二次间隔(如 $ i^2, 2^2 $ 等)探测槽位。
- Reduces clustering compared to linear probing.
与线性探测相比减少堆积。
- Pseudo Random Probing (伪随机探测):
Use a random function to determine the next probe location.
使用随机函数确定下一个探测位置。
- Advantage: Reduces clustering further.
优点:进一步减少堆积。
- Double Hashing (双重哈希):
Use a second hash function to calculate the probe step size.
使用第二个哈希函数计算探测步长。
- Advantage: Minimizes clustering and improves performance.
优点:最小化堆积,提升性能。


5. Algorithm Complexity (算法复杂度)

Operation Best Case $ O $ Worst Case $ O $
Search (查找) $ O(1) $ $ O(n) $
Insert (插入) $ O(1) $ $ O(n) $
Delete (删除) $ O(1) $ $ O(n) $
  • Best Case: Minimal collisions and efficient probing.
    最优情况:冲突最少,探测高效。
  • Worst Case: High load factor and poor hash function lead to many collisions.
    最差情况:高负载因子与低效哈希函数导致大量冲突。

6. Comparison of Open and Closed Hashing (开放与封闭哈希法对比)

Feature (特性) Open Hashing (开放哈希) Closed Hashing (封闭哈希)
Memory Usage (内存使用) Requires extra memory Compact memory usage
Collision Handling (冲突处理) Linked structures Probing techniques
Load Factor (负载因子) Handles high load well Performance degrades under high load

Chapter 6 Priority Queues (Heaps)

理解: Concepts of priority queues, Complete binary tree

掌握 Binary heaps(insertion, delete and buildHeap operations and their algorithm complexity)

1. Concepts of Priority Queues (优先队列的概念)

Definition (定义)

  • A priority queue is an abstract data type where each element is associated with a priority, and elements with higher priority are served before those with lower priority.
    优先队列是一种抽象数据类型,每个元素都与一个优先级相关联,优先级高的元素会先被处理。

Key Properties (关键特性)

  • Access Rule (访问规则): The element with the highest (or lowest, depending on implementation) priority is dequeued first.
    按照优先级顺序,优先级最高(或最低)的元素最先出队。
  • Typical Implementations (常见实现方式):
    • Binary Heap (二叉堆)
    • Fibonacci Heap (斐波那契堆)
    • Pairing Heap (配对堆)

Applications (应用)

  • Scheduling tasks (任务调度)
  • Dijkstra's algorithm for shortest paths (迪杰斯特拉算法)
  • Event-driven simulation (事件驱动模拟)

2. Concepts of Complete Binary Tree (完全二叉树的概念)

Definition (定义)

  • A complete binary tree is a binary tree in which every level, except possibly the last, is fully filled, and all nodes are as far left as possible.
    完全二叉树是一种二叉树,除最后一层外的所有层次都被完全填充,最后一层的节点尽可能靠左排列。

Properties (性质)

  • Node Count (节点数量): A complete binary tree with height $ h $ has between $ 2^h $ and $ 2^{h+1} - 1 $ nodes.
    高度为 $ h $ 的完全二叉树的节点数量介于 $ 2^h $ 和 $ 2^{h+1} - 1 $ 之间。
  • Height (高度): For $ n $ nodes, the height is $ _2 n $.
    对于 $ n $ 个节点,树的高度为 $ _2 n $。
  • Efficient Storage (高效存储): Can be efficiently stored in arrays, with the following relationships for node indices:
    • Parent Index: $ (i) = (i-1)/2 $
    • Left Child Index: $ (i) = 2i + 1 $
    • Right Child Index: $ (i) = 2i + 2 $

3. Binary Heap (二叉堆)

Definition (定义)

  • A binary heap is a complete binary tree that satisfies the heap property:
    • Max-Heap (最大堆): The value of each parent node is greater than or equal to its children.
      每个父节点的值大于等于其子节点的值。
    • Min-Heap (最小堆): The value of each parent node is less than or equal to its children.
      每个父节点的值小于等于其子节点的值。

Key Operations (关键操作)

3.1 Insertion (插入)

  • Steps (步骤):
    1. Add the new element to the next available position in the heap (maintaining the complete binary tree property).
      将新元素插入到堆的下一个可用位置(保持完全二叉树性质)。
    2. "Bubble up" (or heapify-up) to restore the heap property by swapping with its parent until the property is satisfied.
      通过与父节点交换进行“上浮”,以恢复堆性质。
  • Time Complexity (时间复杂度): $ O(n) $, where $ n $ is the number of elements in the heap.

3.2 Deletion (删除)

  • Typically, deletion refers to removing the root element (highest priority).
    通常删除操作指移除根节点(优先级最高)。
  • Steps (步骤):
    1. Replace the root with the last element in the heap.
      用堆中的最后一个元素替换根节点。
    2. Remove the last element.
      移除最后一个元素。
    3. "Bubble down" (or heapify-down) the new root to restore the heap property by swapping with its larger (or smaller for a min-heap) child.
      通过与较大(或对于最小堆为较小)子节点交换进行“下沉”,以恢复堆性质。
  • Time Complexity (时间复杂度): $ O(n) $.

3.3 BuildHeap (构建堆)

  • Constructs a heap from an unsorted array.
    从无序数组构建堆。
  • Steps (步骤):
    1. Start from the last non-leaf node and perform heapify-down on each node moving toward the root.
      从最后一个非叶子节点开始,对每个节点执行下沉操作,逐步向根节点移动。
  • Time Complexity (时间复杂度): $ O(n) $ due to the efficient bottom-up approach.
    由于采用自底向上的方法,复杂度为 $ O(n) $。

4. Algorithm Complexity (算法复杂度)

Operation Time Complexity $ O $ Explanation (说明)
Insert (插入) $ O(n) $ Depends on the height of the heap.
Delete (删除) $ O(n) $ Bubble down operation determines cost.
BuildHeap (构建堆) $ O(n) $ Bottom-up approach reduces complexity.

5. Applications of Binary Heaps (二叉堆的应用)

  • Priority Queue Implementation (优先队列的实现)
  • Heap Sort (堆排序)
  • Graph Algorithms (图算法):
    • Dijkstra's Shortest Path Algorithm (迪杰斯特拉最短路径算法)
    • Prim's Minimum Spanning Tree Algorithm (普里姆最小生成树算法)

Chapter 7 Sorting

理解 internal sorting and external sorting, a general Lower Bound for Sorting

掌握 All sort algorithms learned (Ideas, properties, processes, implementations and their complexity)


1. Internal Sorting and External Sorting (内部排序与外部排序)

Internal Sorting (内部排序)

  • Definition (定义):
    Sorting performed entirely within the main memory.
    数据完全存储在内存中并进行排序的操作。
  • When to Use (适用场景):
    • The dataset is small enough to fit into the main memory.
      数据集足够小,可以放入主存储器。
  • Examples (例子):
    • Bubble Sort, Quick Sort, Merge Sort, Heap Sort, etc.

External Sorting (外部排序)

  • Definition (定义):
    Sorting performed on data stored in external storage devices (e.g., disk) because the dataset is too large for main memory.
    针对存储在外部存储设备(如磁盘)上的大规模数据的排序。
  • When to Use (适用场景):
    • The dataset exceeds the size of the available main memory.
      数据集超出内存容量。
  • Examples (例子):
    • External Merge Sort, Multi-way Merge Sort.

2. General Lower Bound for Sorting (排序的通用下界)

Key Concepts (核心概念)

  • Comparison-Based Sorting (基于比较的排序):
    Sorting algorithms that decide the order of elements by comparing pairs of elements.
    通过比较元素对来决定顺序的排序算法。
  • Decision Tree Model (决策树模型):
    • Sorting can be represented as a binary decision tree.
      排序可以表示为二叉决策树。
    • Each comparison corresponds to a binary decision.
      每次比较对应于一次二元决策。
    • Height of the tree gives the number of comparisons in the worst case.
      决策树的高度表示最坏情况下的比较次数。

Lower Bound (下界)

  • Theorem (定理): Any comparison-based sorting algorithm has a worst-case time complexity of at least $ O(n n) $.
    任意基于比较的排序算法的最坏情况时间复杂度至少为 $ O(n n) $。
  • Explanation (解释):
    • There are $ n! $ possible permutations of $ n $ elements.
      $ n $ 个元素共有 $ n! $ 种排列方式。
    • The height of a decision tree with $ n! $ leaves is $ _2 (n!) $, which is $ (n n) $.

3. Sorting Algorithms (排序算法)

以下总结了常见排序算法的思想性质过程实现复杂度


3.1 Bubble Sort (冒泡排序)

  • Idea (思想):
    Repeatedly swap adjacent elements if they are in the wrong order.
    通过不断交换相邻的逆序元素实现排序。
  • Properties (性质):
    • In-place (原地排序)
    • Stable (稳定排序)
  • Complexity (复杂度):
    • Best: $ O(n) $, Worst: $ O(n^2) $, Average: $ O(n^2) $

3.2 Selection Sort (选择排序)

  • Idea (思想):
    Repeatedly find the minimum element and place it in the correct position.
    每次找到最小的元素,并将其放置到正确位置。
  • Properties (性质):
    • In-place (原地排序)
    • Not stable (不稳定)
  • Complexity (复杂度):
    • Best, Worst, Average: $ O(n^2) $

3.3 Insertion Sort (插入排序)

  • Idea (思想):
    Build the sorted part one element at a time by inserting new elements into their correct positions.
    每次将一个新元素插入到已排序部分的正确位置。
  • Properties (性质):
    • In-place (原地排序)
    • Stable (稳定排序)
  • Complexity (复杂度):
    • Best: $ O(n) $, Worst: $ O(n^2) $, Average: $ O(n^2) $

3.4 Merge Sort (归并排序)

  • Idea (思想):
    Divide the array into halves, sort each half, and merge the sorted halves.
    将数组划分为两半,分别排序,然后合并。
  • Properties (性质):
    • Not in-place (非原地排序)
    • Stable (稳定排序)
  • Complexity (复杂度):
    • Best, Worst, Average: $ O(n n) $

3.5 Quick Sort (快速排序)

  • Idea (思想):
    Select a pivot, partition the array around the pivot, and recursively sort the partitions.
    选择一个轴点,将数组按轴点划分,然后递归排序子部分。
  • Properties (性质):
    • In-place (原地排序)
    • Not stable (不稳定)
  • Complexity (复杂度):
    • Best, Average: $ O(n n) $, Worst: $ O(n^2) $

3.6 Heap Sort (堆排序)

  • Idea (思想):
    Build a binary heap, and repeatedly extract the maximum element to sort.
    构建二叉堆,每次提取堆顶元素以完成排序。
  • Properties (性质):
    • In-place (原地排序)
    • Not stable (不稳定)
  • Complexity (复杂度):
    • Best, Worst, Average: $ O(n n) $

3.7 Counting Sort (计数排序)

  • Idea (思想):
    Count the occurrences of each element and use this information to place elements in the sorted order.
    统计每个元素出现的次数,并利用这些信息排序。
  • Properties (性质):
    • Not comparison-based (非比较排序)
    • Stable (稳定排序)
  • Complexity (复杂度):
    • Best, Worst, Average: $ O(n + k) $, where $ k $ is the range of input values.

3.8 Radix Sort (基数排序)

  • Idea (思想):
    Sort elements by processing individual digits from least significant to most significant.
    按照从低位到高位处理各位数字进行排序。
  • Properties (性质):
    • Not comparison-based (非比较排序)
    • Stable (稳定排序)
  • Complexity (复杂度):
    • Best, Worst, Average: $ O(n k) $, where $ k $ is the number of digits.

3.9 Bucket Sort (桶排序)

  • Idea (思想):
    Divide the array into buckets, sort each bucket, and concatenate the results.
    将数组划分到多个桶中,分别排序每个桶,然后合并结果。
  • Properties (性质):
    • Not comparison-based (非比较排序)
    • Stable (稳定排序)
  • Complexity (复杂度):
    • Best: $ O(n + k) $, Worst: $ O(n^2) $, Average: $ O(n + k) $

Chapter 8 The Disjoint Sets Class

理解 Equivalence class and the dynamic equivalence problem

掌握 union/find algorithm(Smart union algorithms-union by-size & union by-height, path compression)

1. Equivalence Class and Dynamic Equivalence Problem (等价类与动态等价问题)

Equivalence Class (等价类)

Definition (定义)

  • An equivalence class is a subset of a set, where all elements are equivalent under a given equivalence relation.
    等价类是集合的一个子集,其中的所有元素在某种等价关系下是等价的。

Properties (性质)

  • Reflexive (自反性): $ a a $
  • Symmetric (对称性): If $ a b $, then $ b a $.
  • Transitive (传递性): If $ a b $ and $ b c $, then $ a c $.

Examples (例子)

  • Congruence modulo $ n $: $ a b $.
    模 $ n $ 的同余关系:$ a b $。
  • Connected components in a graph.
    图的连通分量。

Dynamic Equivalence Problem (动态等价问题)

Definition (定义)

  • A dynamic equivalence problem involves determining whether two elements are in the same equivalence class and allowing for the union of equivalence classes over time.
    动态等价问题是确定两个元素是否属于同一个等价类,并允许随着时间合并等价类的问题。

Operations (操作)

  • Find (查询): Determine which equivalence class an element belongs to.
    确定一个元素所属的等价类。
  • Union (合并): Merge two equivalence classes into one.
    将两个等价类合并为一个。

Applications (应用)

  • Network connectivity (网络连通性)
  • Kruskal’s algorithm for Minimum Spanning Tree (Kruskal算法求最小生成树)
  • Image processing for connected components (图像处理中连通分量的识别)

2. Union-Find Algorithm (并查集算法)

Definition (定义)

  • The union-find algorithm is a data structure that efficiently supports union and find operations.
    并查集算法是一种高效支持合并和查询操作的数据结构。

Key Concepts (核心概念)

2.1 Smart Union Algorithms (优化的合并算法)

  • Union by Size (按大小合并):
    Attach the smaller tree as a subtree of the larger tree to minimize height growth.
    将节点数较少的树作为子树连接到节点数较多的树上,以最小化树的高度增长。
    • Complexity (复杂度): $ O(n) $ per operation.
  • Union by Height (按高度合并):
    Attach the shorter tree as a subtree of the taller tree to avoid increasing the tree height unnecessarily.
    将高度较小的树作为子树连接到高度较大的树上,以避免不必要的高度增长。
    • Complexity (复杂度): $ O(n) $ per operation.

2.2 Path Compression (路径压缩)

  • Idea (思想):
    During a Find operation, flatten the structure by making every node point directly to the root.
    在查询操作中,将路径上的每个节点直接指向根节点,从而压平结构。
  • Effect (效果):
    Significantly reduces the depth of the tree over time, improving efficiency for subsequent operations.
    随着时间的推移显著减少树的深度,提高后续操作的效率。
  • Complexity (复杂度): Nearly $ O(1) $ amortized per operation when combined with smart union techniques.

Algorithm Implementation (算法实现)

Find Operation (查询操作)

1
2
3
4
def find(x, parent):
if parent[x] != x: # Not the root
parent[x] = find(parent[x], parent) # Path compression
return parent[x]

Union Operation (合并操作)

1
2
3
4
5
6
7
8
9
10
11
def union(x, y, parent, rank):
rootX = find(x, parent)
rootY = find(y, parent)
if rootX != rootY:
if rank[rootX] > rank[rootY]: # Union by rank
parent[rootY] = rootX
elif rank[rootX] < rank[rootY]:
parent[rootX] = rootY
else:
parent[rootY] = rootX
rank[rootX] += 1

Complexity Analysis (复杂度分析)

Operation Complexity (时间复杂度) Explanation (说明)
Find $ O((n)) $ $ (n) $ 是反Ackermann函数,极小。
Union $ O((n)) $ 路径压缩与按秩合并使操作接近常数时间。
Total for $ m $ operations $ O(m (n)) $ 对 $ m $ 次操作,几乎线性时间。

Applications of Union-Find Algorithm (并查集算法的应用)

  1. Connected Components in Graphs (图的连通分量): Determine the number of connected components in an undirected graph.
    判断无向图中连通分量的数量。
  2. Kruskal’s Algorithm (Kruskal算法): Used in finding the Minimum Spanning Tree (MST).
    用于求最小生成树。
  3. Dynamic Connectivity (动态连通性): Solve problems where connections between elements change over time.
    解决元素间连通性随时间变化的问题。
  4. Image Segmentation (图像分割): Identify connected regions in binary images.
    识别二值图像中的连通区域。

Chapter 9 Graph Algorithms

理解 concept of graph(vertex, edge, degree, path, cyclic vs. acyclic, directed vs. undirected, weighted vs. unweighted, connected component, spanning tree), two representations,

掌握 graph searching: BFS and DFS, Topological sort and their applications; shortest-path problem-Dijkstra‘s algorithm; Minimum-cost spanning trees:Prim and Kruskal. ( ideas, processes, implementations)


1. Graph Concepts (图的基本概念)

Key Terms (关键术语)

Term (术语) Definition (定义)
Vertex (顶点) A fundamental unit of a graph, representing a point or object.
Edge (边) A connection between two vertices, representing a relationship or link.
Degree (度) The number of edges connected to a vertex.
Path (路径) A sequence of vertices connected by edges.
Cycle (环) A path that starts and ends at the same vertex without reusing any edge.
Cyclic (有环) A graph containing at least one cycle.
Acyclic (无环) A graph with no cycles.
Directed Graph (有向图) A graph where edges have a direction (e.g., $ A B $).
Undirected Graph (无向图) A graph where edges have no direction (e.g., $ A B $).
Weighted Graph (加权图) A graph where edges have associated weights (e.g., cost, distance, time).
Unweighted Graph (无权图) A graph where all edges are considered equal.
Connected Component (连通分量) A subgraph where any two vertices are connected by a path, and no additional vertex can be added while preserving connectivity.
Spanning Tree (生成树) A subgraph that includes all the vertices of the original graph and forms a tree (connected and acyclic).

2. Graph Representations (图的表示方法)

1. Adjacency Matrix (邻接矩阵)

  • Definition (定义): A 2D array where element $ A[i][j] $ represents the presence (or weight) of an edge between vertex $ i $ and $ j $.
  • Advantages (优点):
    • Easy to implement and understand.
    • Fast edge lookup ($ O(1) $).
  • Disadvantages (缺点):
    • Requires $ O(V^2) $ space, even for sparse graphs.
    • Inefficient for iterating over neighbors.

2. Adjacency List (邻接表)

  • Definition (定义): An array (or map) of lists where each vertex points to a list of its adjacent vertices.
  • Advantages (优点):
    • Space-efficient for sparse graphs ($ O(V + E) $).
    • Fast neighbor iteration.
  • Disadvantages (缺点):
    • Edge lookup takes $ O(V) $ in the worst case.

3. Graph Searching Algorithms (图搜索算法)

1. Breadth-First Search (BFS, 广度优先搜索)

  • Idea (思想): Explore all neighbors of a vertex before moving to the next level of neighbors.
  • Process (过程):
    1. Start from a source vertex.
    2. Use a queue to track the vertices to be visited.
    3. Mark visited vertices to avoid revisiting.
  • Time Complexity (时间复杂度): $ O(V + E) $
  • Applications (应用):
    • Shortest path in an unweighted graph.
    • Connected component detection.

2. Depth-First Search (DFS, 深度优先搜索)

  • Idea (思想): Explore as far as possible along each branch before backtracking.
  • Process (过程):
    1. Start from a source vertex.
    2. Use recursion or a stack to track the vertices being visited.
    3. Mark visited vertices to avoid revisiting.
  • Time Complexity (时间复杂度): $ O(V + E) $
  • Applications (应用):
    • Detecting cycles.
    • Topological sorting.
    • Connected component detection.

3. Topological Sort (拓扑排序)

  • Idea (思想): A linear ordering of vertices such that for every directed edge $ u v $, $ u $ appears before $ v $ in the ordering.
  • Process (过程):
    1. Use DFS and record the finishing times of vertices.
    2. Reverse the order of finishing times.
  • Time Complexity (时间复杂度): $ O(V + E) $
  • Applications (应用):
    • Scheduling tasks with dependencies.
    • Resolving symbol dependencies in compilers.

4. Shortest-Path Problem (最短路径问题)

Dijkstra’s Algorithm (迪杰斯特拉算法)

  • Idea (思想): Use a priority queue to greedily select the vertex with the smallest tentative distance.
  • Process (过程):
    1. Initialize distances to all vertices as infinity, except the source vertex (set to 0).
    2. Use a priority queue to repeatedly extract the vertex with the smallest distance.
    3. Relax all outgoing edges of the extracted vertex.
  • Time Complexity (时间复杂度): $ O((V + E) V) $ with a priority queue.
  • Applications (应用):
    • Finding shortest paths in weighted graphs with non-negative weights.

5. Minimum-Cost Spanning Trees (最小生成树)

1. Prim’s Algorithm (普里姆算法)

  • Idea (思想): Grow the spanning tree from an arbitrary vertex by greedily selecting the smallest edge that connects the tree to a new vertex.
  • Process (过程):
    1. Start with an arbitrary vertex.
    2. Use a priority queue to track the minimum-cost edges.
    3. Add the smallest edge to the spanning tree, marking the newly connected vertex.
  • Time Complexity (时间复杂度): $ O((V + E) V) $ with a priority queue.
  • Applications (应用):
    • Constructing network infrastructure with minimum cost.

2. Kruskal’s Algorithm (克鲁斯卡尔算法)

  • Idea (思想): Sort all edges by weight and greedily add edges to the spanning tree, ensuring no cycles are formed.
  • Process (过程):
    1. Sort all edges by weight.
    2. Use the union-find algorithm to add edges without forming cycles.
  • Time Complexity (时间复杂度): $ O(E E + E (V)) $, where $ (V) $ is the inverse Ackermann function from the union-find algorithm.
  • Applications (应用):
    • Building cost-efficient networks.

题型

题型 题数 分值占比 分值 备注
选择题 10题 20% 20分 每题2分
编程题 10空 20% 20分 每空2分
解答题 3题 30% 30分 每题10分
综合题 2题 30% 30分 每题15分