语法分析主要用于处理上下文无关(CFG:contex free grammar)文法。
1. 术语
1.1. Formal Language
Def: let $\sum$ be a set of characters (an alphabet). An language over $\sum$ is a set of strings of characters drawn from $\sum$.
意味着语言L是一个映射,将语法映射到语义。
\[L(reg) = a\;set\;of\;strings.\]例如,L(G)表示由G生成的语句集合,也即由G生成的语言。
1.2. context free grammar
Def: For a language L, its CFG defines the sets of strings of symbols that are valid sentences in L.
CFG由(T,NT,S,PR)共同构成。
1.3. sentence
Def: a string of symbols that can be derived from the rules of a grammar.
1.4. production
Def: Each rule in a CFG is called a production
1.5. Nonterminal symbol
Def: a syntactic variable used in a grammar’s productions
1.6. Terminal symbol
Def: a word that can occur in a sentence
1.7. Word
Def: A word consists of a lexeme and its syntactic category. Words are represented in a grammar by their syntactic category.
2. if-then-else的匹配
文法出现二义性通常由于两种原因导致:
- precedence:运算符优先级没有指明
- associativity:运算符关联性没有指明
例如else的匹配问题,通常我们希望else匹配到最近的前一个未匹配的if语句,可用的几种文法如下:
1 |
|
或者等价的(下方unmatched提到stmt中,matched和withelse相同):
1 |
|
通过以上文法可以观察到,parse的关键在于if-then-else结构的then部分,需要限制如果当前能够使用else匹配,则要保证then部分必须要么(I)不存在if相关(if-then或者if-then-else)结构;要么(II)then部分含有的if结构均为if-then-else类型。
但是下面的文法不行
1 |
|
问题在于matchedstmt的else部分不应当为stmt,导致matchedstmt可以匹配if-then-else-if-then
。反例如下
1 |
|