编译技术2-7
编译技术2-7
📌 Conflict Resolutions 冲突解决
From Regular Expressions to Implementation 从正则表达式到实现
1. From Regular Expressions to NFAs
从正则表达式到 NFA
- Inductive method 归纳方法
2. From NFA to DFA
从 NFA 到 DFA
- Subset construction algorithm 子集构造算法
3. Minimizing DFA
最小化 DFA
- State-Minimization Algorithm 状态最小化算法
🚧 Challenges in Scanning 扫描中的挑战
- How do we determine which lexemes are associated with each token? 如何确定哪些词素与每个标记相关联?
- When there are multiple ways we could scan the input, how do
we know which one to pick?
当有多种扫描输入的方式时,我们如何知道选择哪一种?
- Maximal munch 最大匹配原则
- Priority system 优先级系统
- How do we address these concerns efficiently? 如何高效地解决这些问题?
⚠️ Lexing Ambiguities 词法分析歧义
Example 示例:
1 | T_For: for |
Input: "fort"
Possible tokenizations:
"for"
+"t"
→ T_For + T_Identifier"f"
+"o"
+"r"
+"t"
→ 4 T_Identifier tokens"fo"
+"rt"
→ 2 T_Identifier tokens- etc.
🔄 Conflict Resolution Algorithm 冲突解决算法
- Left-to-right scan 从左到右扫描
- Maximal munch 最大匹配原则
- Always match the longest possible prefix of the remaining text 总是匹配剩余文本中最长的可能前缀
⚙️ Implementing Maximal Munch 实现最大匹配原则
Approach 方法:
- Convert all regular expressions to NFAs 将所有正则表达式转换为 NFA
- Run all NFAs in parallel 并行运行所有 NFA
- Keep track of the last match 跟踪最后一个匹配
- When all automata get stuck, report the last match and restart 当所有自动机都卡住时,报告最后一个匹配并重新开始
Example with Tokens 示例
- T_Do:
"do"
- T_Double:
"double"
- T_Mystery:
[A-Za-z]
输入: "DOUBLE"
→ 所有 token 的 NFA
会并行运行来扫描输入。
🥇 Priority System 优先级系统
When two regular expressions match the same text:
- Choose the one defined first (higher priority) 选择先定义的规则(更高优先级)
Example:
- T_Do
- T_Double
- T_Identifier
对于输入 "double"
,T_Double 会优先于 T_Identifier
被选择。
❗ Error Handling 错误处理
Catch-all rule 通用规则:
- Matches any character and reports an error 匹配任何字符并报告错误
- Ensures the scanner never gets stuck 确保扫描器永远不会卡住
✅ Summary of Conflict Resolution
冲突解决总结
- Construct an automaton for each regular expression 为每个正则表达式构建一个自动机
- Merge them into one automaton with a new start state 将它们合并为一个具有新起始状态的自动机
- Scan input while tracking the last known match 扫描输入同时跟踪最后一个已知匹配
- Break ties using rule precedence 使用规则优先级打破平局
- Include a catch-all error rule 包含一个通用的错误规则
🎬 From Regular Expressions to Implementation
从正则表达式到实现
讲座过渡内容:
- 观看 SPOC 平台上的 MOOC 视频
- 完成 SPOC 第三讲中的小测验
📺 Key Topics in Upcoming Videos 后续视频重点内容
- From regular expressions to finite automata (3-5) 从正则表达式到有限自动机
- Conversion from NFA to DFA (3-6) 从 NFA 转换为 DFA
- DFA for word recognition (3-7) 使用 DFA 进行单词识别
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Totoroの旅!
评论