Module: EBNF::PEG::Rule
- Includes:
- Unescape
- Defined in:
- lib/ebnf/peg/rule.rb
Overview
Behaviior for parsing a PEG rule
Constant Summary
Constants included from Unescape
Unescape::ECHAR, Unescape::ESCAPE_CHAR4, Unescape::ESCAPE_CHAR8, Unescape::ESCAPE_CHARS, Unescape::UCHAR
Instance Attribute Summary collapse
-
#parser ⇒ EBNF::PEG::Parser
Initialized by parser when loading rules.
Instance Method Summary collapse
-
#eat_whitespace(input) ⇒ Object
Eat whitespace between non-terminal rules.
-
#parse(input, **options) ⇒ Hash{Symbol => Object}, :unmatched
If there are
start_production
and/orproduction
handlers, they are invoked with aprod_data
stack, the input stream and offset. -
#rept(input, min, max, prod, string_regexp_opts, **options) ⇒ :unmatched, Array
Repitition, 0-1, 0-n, 1-n, …
-
#terminal_also_matches(input, prod, string_regexp_opts) ⇒ Object
See if a terminal could have a longer match than a string.
Methods included from Unescape
unescape, unescape_codepoints, unescape_string
Instance Attribute Details
#parser ⇒ EBNF::PEG::Parser
Initialized by parser when loading rules. Used for finding rules and invoking elements of the parse process.
11 12 13 |
# File 'lib/ebnf/peg/rule.rb', line 11 def parser @parser end |
Instance Method Details
#eat_whitespace(input) ⇒ Object
Eat whitespace between non-terminal rules
301 302 303 304 305 306 307 308 |
# File 'lib/ebnf/peg/rule.rb', line 301 def eat_whitespace(input) if parser.whitespace.is_a?(Regexp) # Eat whitespace before a non-terminal input.skip(parser.whitespace) elsif parser.whitespace.is_a?(Rule) parser.whitespace.parse(input) # throw away result end end |
#parse(input, **options) ⇒ Hash{Symbol => Object}, :unmatched
If there are start_production
and/or production
handlers, they are invoked with a prod_data
stack, the input stream and offset. Otherwise, the results are added as an array value to a hash indexed by the rule name.
If matched, the input position is updated and the results returned in a Hash.
-
alt
: returns the value of the matched production or:unmatched
. -
diff
: returns the value matched, or:unmatched
. -
hex
: returns a string composed of the matched hex character, or:unmatched
. -
opt
: returns the value matched, ornil
if unmatched. -
plus
: returns an array of the values matched for the specified production, or:unmatched
, if none are matched. For Terminals, these are concatenated into a single string. -
range
: returns a string composed of the values matched, or:unmatched
, if less thanmin
are matched. -
rept
: returns an array of the values matched for the speficied production, or:unmatched
, if none are matched. For Terminals, these are concatenated into a single string. -
seq
: returns an array composed of single-entry hashes for each matched production indexed by the production name, or:unmatched
if any production fails to match. For Terminals, returns a string created by concatenating these values. Via option in aproduction
or definition, the result can be a single hash with values for each matched production; note that this is not always possible due to the possibility of repeated productions within the sequence. -
star
: returns an array of the values matched for the specified production. For Terminals, these are concatenated into a single string.
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 |
# File 'lib/ebnf/peg/rule.rb', line 36 def parse(input, **) # Save position and linenumber for backtracking pos, lineno = input.pos, input.lineno parser.packrat[sym] ||= {} if parser.packrat[sym][pos] parser.debug("#{sym}(:memo)", lineno: lineno) { "#{parser.packrat[sym][pos].inspect}(@#{pos})"} input.pos, input.lineno = parser.packrat[sym][pos][:pos], parser.packrat[sym][pos][:lineno] return parser.packrat[sym][pos][:result] end if terminal? # If the terminal is defined with a regular expression, # use that to match the input, # otherwise, if regexp = parser.terminal_regexp(sym) regexp = regexp.call() if regexp.is_a?(Proc) term_opts = parser.(sym) if matched = input.scan(regexp) # Optionally map matched matched = term_opts.fetch(:map, {}).fetch(matched.downcase, matched) # Optionally unescape matched matched = unescape(matched) if term_opts[:unescape] end result = parser.onTerminal(sym, (matched ? matched : :unmatched)) # Update furthest failure for strings and terminals parser.update_furthest_failure(input.pos, input.lineno, sym) if result == :unmatched parser.packrat[sym][pos] = { pos: input.pos, lineno: input.lineno, result: result } return parser.packrat[sym][pos][:result] end else eat_whitespace(input) end = .merge(parser.onStart(sym, **)) string_regexp_opts = [:insensitive_strings] ? Regexp::IGNORECASE : 0 result = case expr.first when :alt # Return the first expression to match. Look at strings before terminals before non-terminals, with strings ordered by longest first # Result is either :unmatched, or the value of the matching rule alt = :unmatched expr[1..-1].each do |prod| alt = case prod when Symbol rule = parser.find_rule(prod) raise "No rule found for #{prod}" unless rule rule.parse(input, **) when String # If the input matches a terminal for which the string is a prefix, don't match the string if terminal_also_matches(input, prod, string_regexp_opts) :unmatched else s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts)) case [:insensitive_strings] when :lower then s && s.downcase when :upper then s && s.upcase else s end || :unmatched end end if alt == :unmatched # Update furthest failure for strings and terminals parser.update_furthest_failure(input.pos, input.lineno, prod) if prod.is_a?(String) || rule.terminal? else break end end alt when :diff # matches any string that matches A but does not match B. # (Note, this is only used for Terminal rules, non-terminals will use :not) raise "Diff used on non-terminal #{prod}" unless terminal? re1, re2 = Regexp.new(translate_codepoints(expr[1])), Regexp.new(translate_codepoints(expr[2])) matched = input.scan(re1) if !matched || re2.match?(matched) # Update furthest failure for terminals parser.update_furthest_failure(input.pos, input.lineno, sym) :unmatched else matched end when :hex # Matches the given hex character if expression matches the character whose number (code point) in ISO/IEC 10646 is N. The number of leading zeros in the #xN form is insignificant. input.scan(to_regexp) || begin # Update furthest failure for terminals parser.update_furthest_failure(input.pos, input.lineno, expr.last) :unmatched end when :not # matches any string that does not match B. res = case prod = expr[1] when Symbol rule = parser.find_rule(prod) raise "No rule found for #{prod}" unless rule rule.parse(input, **) when String if terminal_also_matches(input, prod, string_regexp_opts) :unmatched else s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts)) case [:insensitive_strings] when :lower then s && s.downcase when :upper then s && s.upcase else s end || :unmatched end end if res != :unmatched # Update furthest failure for terminals parser.update_furthest_failure(input.pos, input.lineno, sym) if terminal? :unmatched else nil end when :opt # Result is the matched value or nil opt = rept(input, 0, 1, expr[1], string_regexp_opts, **) # Update furthest failure for strings and terminals parser.update_furthest_failure(input.pos, input.lineno, expr[1]) if terminal? opt.first when :plus # Result is an array of all expressions while they match, # at least one must match plus = rept(input, 1, '*', expr[1], string_regexp_opts, **) # Update furthest failure for strings and terminals parser.update_furthest_failure(input.pos, input.lineno, expr[1]) if terminal? plus.is_a?(Array) && terminal? ? plus.join("") : plus when :range, :istr # Matches the specified character range input.scan(to_regexp) || begin # Update furthest failure for strings and terminals parser.update_furthest_failure(input.pos, input.lineno, expr[1]) :unmatched end when :rept # Result is an array of all expressions while they match, # an empty array of none match rept = rept(input, expr[1], expr[2], expr[3], string_regexp_opts, **) # # Update furthest failure for strings and terminals parser.update_furthest_failure(input.pos, input.lineno, expr[3]) if terminal? rept.is_a?(Array) && terminal? ? rept.join("") : rept when :seq # Evaluate each expression into an array of hashes where each hash contains a key from the associated production and the value is the parsed value of that production. Returns :unmatched if the input does not match the production. Value ordering is ensured by native Hash ordering. seq = expr[1..-1].each_with_object([]) do |prod, accumulator| eat_whitespace(input) unless accumulator.empty? || terminal? res = case prod when Symbol rule = parser.find_rule(prod) raise "No rule found for #{prod}" unless rule rule.parse(input, **.merge(_rept_data: accumulator)) when String if terminal_also_matches(input, prod, string_regexp_opts) :unmatched else s = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts)) case [:insensitive_strings] when :lower then s && s.downcase when :upper then s && s.upcase else s end || :unmatched end end if res == :unmatched # Update furthest failure for strings and terminals parser.update_furthest_failure(input.pos, input.lineno, prod) break :unmatched end accumulator << {prod.to_sym => res} end if seq == :unmatched :unmatched elsif terminal? seq.map(&:values).compact.join("") # Concat values for terminal production elsif [:as_hash] seq.inject {|memo, h| memo.merge(h)} else seq end when :star # Result is an array of all expressions while they match, # an empty array of none match star = rept(input, 0, '*', expr[1], string_regexp_opts, **) # Update furthest failure for strings and terminals parser.update_furthest_failure(input.pos, input.lineno, expr[1]) if terminal? star.is_a?(Array) && terminal? ? star.join("") : star else raise "attempt to parse unknown rule type: #{expr.first}" end if result == :unmatched # Rewind input to entry point if unmatched. input.pos, input.lineno = pos, lineno end result = parser.onFinish(result, **) (parser.packrat[sym] ||= {})[pos] = { pos: input.pos, lineno: input.lineno, result: result } return parser.packrat[sym][pos][:result] end |
#rept(input, min, max, prod, string_regexp_opts, **options) ⇒ :unmatched, Array
Repitition, 0-1, 0-n, 1-n, …
Note, nil results are removed from the result, but count towards min/max calculations. Saves temporary production data to prod_data stack.
263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 |
# File 'lib/ebnf/peg/rule.rb', line 263 def rept(input, min, max, prod, string_regexp_opts, **) result = [] case prod when Symbol rule = parser.find_rule(prod) raise "No rule found for #{prod}" unless rule while (max == '*' || result.length < max) && (res = rule.parse(input, **.merge(_rept_data: result))) != :unmatched eat_whitespace(input) unless terminal? result << res end when String # FIXME: don't match a string, if input matches a terminal while (res = input.scan(Regexp.new(Regexp.quote(prod), string_regexp_opts))) && (max == '*' || result.length < max) eat_whitespace(input) unless terminal? result << case [:insensitive_strings] when :lower then res.downcase when :upper then res.upcase else res end end end result.length < min ? :unmatched : result.compact end |
#terminal_also_matches(input, prod, string_regexp_opts) ⇒ Object
See if a terminal could have a longer match than a string
291 292 293 294 295 296 297 |
# File 'lib/ebnf/peg/rule.rb', line 291 def terminal_also_matches(input, prod, string_regexp_opts) str_regex = Regexp.new(Regexp.quote(prod), string_regexp_opts) input.match?(str_regex) && parser.class.terminal_regexps.any? do |sym, re| re = re.call() if re.is_a?(Proc) (match_len = input.match?(re)) && match_len > prod.length end end |