Module: EBNF::Unescape

Included in:
LL1::Lexer, PEG::Rule
Defined in:
lib/ebnf/unescape.rb

Overview

Unsecape strings

Constant Summary collapse

ESCAPE_CHARS =
{
  '\\t'   => "\t",  # \u0009 (tab)
  '\\n'   => "\n",  # \u000A (line feed)
  '\\r'   => "\r",  # \u000D (carriage return)
  '\\b'   => "\b",  # \u0008 (backspace)
  '\\f'   => "\f",  # \u000C (form feed)
  '\\"'  => '"',    # \u0022 (quotation mark, double quote mark)
  "\\'"  => '\'',   # \u0027 (apostrophe-quote, single quote mark)
  '\\\\' => '\\'    # \u005C (backslash)
}.freeze
ESCAPE_CHAR4 =

u005C (backslash)

/\\u(?:[0-9A-Fa-f]{4,4})/u.freeze
ESCAPE_CHAR8 =

UXXXXXXXX

/\\U(?:[0-9A-Fa-f]{8,8})/u.freeze
ECHAR =

More liberal unescaping

/\\./u.freeze
UCHAR =
/#{ESCAPE_CHAR4}|#{ESCAPE_CHAR8}/n.freeze

Class Method Summary collapse

Class Method Details

.unescape(string) ⇒ String

Perform string and codepoint unescaping if defined for this terminal

Parameters:

  • string (String)

Returns:

  • (String)


58
59
60
# File 'lib/ebnf/unescape.rb', line 58

def unescape(string)
  unescape_string(unescape_codepoints(string))
end

.unescape_codepoints(string) ⇒ String

Returns a copy of the given input string with all \uXXXX and \UXXXXXXXX Unicode codepoint escape sequences replaced with their unescaped UTF-8 character counterparts.

Parameters:

  • string (String)

Returns:

  • (String)

See Also:



27
28
29
30
31
32
33
34
35
36
37
38
39
# File 'lib/ebnf/unescape.rb', line 27

def unescape_codepoints(string)
  string = string.dup
  string.force_encoding(Encoding::ASCII_8BIT) if string.respond_to?(:force_encoding)

  # Decode \uXXXX and \UXXXXXXXX code points:
  string = string.gsub(UCHAR) do |c|
    s = [(c[2..-1]).hex].pack('U*')
    s.respond_to?(:force_encoding) ? s.force_encoding(Encoding::ASCII_8BIT) : s
  end

  string.force_encoding(Encoding::UTF_8) if string.respond_to?(:force_encoding) 
  string
end

.unescape_string(input) ⇒ String

Returns a copy of the given input string with all string escape sequences (e.g. \n and \t) replaced with their unescaped UTF-8 character counterparts.

Parameters:

  • input (String)

Returns:

  • (String)

See Also:



50
51
52
# File 'lib/ebnf/unescape.rb', line 50

def unescape_string(input)
  input.gsub(ECHAR) {|escaped| ESCAPE_CHARS[escaped] || escaped}
end