Module: EBNF::Unescape

Included in:: LL1::Lexer, PEG::Rule

Defined in:: lib/ebnf/unescape.rb

Overview

Unsecape strings

Constant Summary collapse

ESCAPE_CHARS =

{
  '\\t'   => "\t",  # \u0009 (tab)
  '\\n'   => "\n",  # \u000A (line feed)
  '\\r'   => "\r",  # \u000D (carriage return)
  '\\b'   => "\b",  # \u0008 (backspace)
  '\\f'   => "\f",  # \u000C (form feed)
  '\\"'  => '"',    # \u0022 (quotation mark, double quote mark)
  "\\'"  => '\'',   # \u0027 (apostrophe-quote, single quote mark)
  '\\\\' => '\\'    # \u005C (backslash)
}.freeze

ESCAPE_CHAR4 = u005C (backslash)

/\\u(?:[0-9A-Fa-f]{4,4})/u.freeze

ESCAPE_CHAR8 = UXXXXXXXX

/\\U(?:[0-9A-Fa-f]{8,8})/u.freeze

ECHAR = More liberal unescaping

/\\./u.freeze

UCHAR =

/#{ESCAPE_CHAR4}|#{ESCAPE_CHAR8}/n.freeze

Class Method Summary collapse

.unescape(string) ⇒ String

Perform string and codepoint unescaping if defined for this terminal.
.unescape_codepoints(string) ⇒ String

Returns a copy of the given input string with all \uXXXX and \UXXXXXXXX Unicode codepoint escape sequences replaced with their unescaped UTF-8 character counterparts.
.unescape_string(input) ⇒ String

Returns a copy of the given input string with all string escape sequences (e.g. \n and \t) replaced with their unescaped UTF-8 character counterparts.

Class Method Details

.unescape(string) ⇒ `String`

Perform string and codepoint unescaping if defined for this terminal

Parameters:

string (String)

Returns:

(String)



58
59
60

# File 'lib/ebnf/unescape.rb', line 58

def unescape(string)
  unescape_string(unescape_codepoints(string))
end

.unescape_codepoints(string) ⇒ `String`

Returns a copy of the given input string with all \uXXXX and \UXXXXXXXX Unicode codepoint escape sequences replaced with their unescaped UTF-8 character counterparts.

Parameters:

string (String)

Returns:

(String)

See Also:

https://www.w3.org/TR/rdf-sparql-query/#codepointEscape

# File 'lib/ebnf/unescape.rb', line 27

def unescape_codepoints(string)
  string = string.dup
  string.force_encoding(Encoding::ASCII_8BIT) if string.respond_to?(:force_encoding)

  # Decode \uXXXX and \UXXXXXXXX code points:
  string = string.gsub(UCHAR) do |c|
    s = [(c[2..-1]).hex].pack('U*')
    s.respond_to?(:force_encoding) ? s.force_encoding(Encoding::ASCII_8BIT) : s
  end

  string.force_encoding(Encoding::UTF_8) if string.respond_to?(:force_encoding) 
  string
end

.unescape_string(input) ⇒ `String`

Returns a copy of the given input string with all string escape sequences (e.g. \n and \t) replaced with their unescaped UTF-8 character counterparts.

Parameters:

input (String)

Returns:

(String)

Module: EBNF::Unescape

Overview

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.unescape(string) ⇒ String

.unescape_codepoints(string) ⇒ String

.unescape_string(input) ⇒ String

.unescape(string) ⇒ `String`

.unescape_codepoints(string) ⇒ `String`

.unescape_string(input) ⇒ `String`