编译技术2-7


📌 Conflict Resolutions 冲突解决

From Regular Expressions to Implementation 从正则表达式到实现


1. From Regular Expressions to NFAs

从正则表达式到 NFA

  • Inductive method 归纳方法

2. From NFA to DFA

从 NFA 到 DFA

  • Subset construction algorithm 子集构造算法

3. Minimizing DFA

最小化 DFA

  • State-Minimization Algorithm 状态最小化算法

🚧 Challenges in Scanning 扫描中的挑战

  1. How do we determine which lexemes are associated with each token? 如何确定哪些词素与每个标记相关联?
  2. When there are multiple ways we could scan the input, how do we know which one to pick? 当有多种扫描输入的方式时,我们如何知道选择哪一种?
    • Maximal munch 最大匹配原则
    • Priority system 优先级系统
  3. How do we address these concerns efficiently? 如何高效地解决这些问题?

⚠️ Lexing Ambiguities 词法分析歧义

Example 示例:

1
2
T_For: for  
T_Identifier: [A-Za-z_][A-Za-z0-9_]*

Input: "fort"

Possible tokenizations:

  • "for" + "t" → T_For + T_Identifier
  • "f" + "o" + "r" + "t" → 4 T_Identifier tokens
  • "fo" + "rt" → 2 T_Identifier tokens
  • etc.

🔄 Conflict Resolution Algorithm 冲突解决算法

  1. Left-to-right scan 从左到右扫描
  2. Maximal munch 最大匹配原则
    • Always match the longest possible prefix of the remaining text 总是匹配剩余文本中最长的可能前缀

⚙️ Implementing Maximal Munch 实现最大匹配原则

Approach 方法:

  1. Convert all regular expressions to NFAs 将所有正则表达式转换为 NFA
  2. Run all NFAs in parallel 并行运行所有 NFA
  3. Keep track of the last match 跟踪最后一个匹配
  4. When all automata get stuck, report the last match and restart 当所有自动机都卡住时,报告最后一个匹配并重新开始

Example with Tokens 示例

  • T_Do: "do"
  • T_Double: "double"
  • T_Mystery: [A-Za-z]

输入: "DOUBLE" → 所有 token 的 NFA 会并行运行来扫描输入。

img

🥇 Priority System 优先级系统

When two regular expressions match the same text:

  • Choose the one defined first (higher priority) 选择先定义的规则(更高优先级)

Example:

  1. T_Do
  2. T_Double
  3. T_Identifier

对于输入 "double",T_Double 会优先于 T_Identifier 被选择。


❗ Error Handling 错误处理

Catch-all rule 通用规则:

  • Matches any character and reports an error 匹配任何字符并报告错误
  • Ensures the scanner never gets stuck 确保扫描器永远不会卡住

✅ Summary of Conflict Resolution

冲突解决总结

  1. Construct an automaton for each regular expression 为每个正则表达式构建一个自动机
  2. Merge them into one automaton with a new start state 将它们合并为一个具有新起始状态的自动机
  3. Scan input while tracking the last known match 扫描输入同时跟踪最后一个已知匹配
  4. Break ties using rule precedence 使用规则优先级打破平局
  5. Include a catch-all error rule 包含一个通用的错误规则

🎬 From Regular Expressions to Implementation

从正则表达式到实现

讲座过渡内容:

  • 观看 SPOC 平台上的 MOOC 视频
  • 完成 SPOC 第三讲中的小测验

📺 Key Topics in Upcoming Videos 后续视频重点内容

  1. From regular expressions to finite automata (3-5) 从正则表达式到有限自动机
  2. Conversion from NFA to DFA (3-6) 从 NFA 转换为 DFA
  3. DFA for word recognition (3-7) 使用 DFA 进行单词识别