Tags:
create new tag
view all tags

.TEXT FROM UNICODE

This subroutine transcodes a Unicode alpha field into a RAW alpha field. Added in 6.0.0


Usage

      PASS         <raw>                      FIELD            SHARE? Y
      PASS         <unicode_source>           FIELD            SHARE? N
      PASS         <encoding>                 FIELD            SHARE? N
      PASS         <action>                   FIELD            SHARE? N
      PASS         <option>                   FIELD            SHARE? N
      PASS         <error_text>               FIELD            SHARE? Y
      GOSUB    --- .TEXT FROM UNICODE
      *        Check for errors
      IF       --- .TEXT FROM UNICODE         NE

Description

This subroutine will transcode a Unicode alpha field to a RAW alpha field with error handling for the characters that cannot be transcoded. The first 2 parameters are required. If any required parameters are missing, the subroutine will CANCEL.

<raw> is the RAW alpha field to contain the transcoded string (Required). This must be PASSed with Share "Y" to return the value. This must be a RAW field type, otherwise the transcoding will fail.

<unicode_source> is the field containing the Unicode string to be transcoded (Required).

<encoding> is the encoding to use in the RAW target field. If not specified or a blank is PASSed, it will check the environment variable APPX_RAW_ENCODING and use that encoding. If that is not present, it will default to ISO-8859-15.

<action> is the action to be taken if a character cannot be transcoded to RAW (Optional). If not PASSed or blank is PASSed, defaults to SKIP.

  • SKIP - skip the offending character, it will not be transcoded to the target field but the remaining characters will be.
  • STOP - stop transcoding and return an error
  • SUBS - Substitute the character in the <option> field for the offending character
  • ESCAPE - Escape the character according to the convention specified in the <option> field
<option> contains either the substitute character to use if <action> is SUBS or the convention to use if <action> is ESCAPE. This is required if <action> is SUBS or ESCAPE. If <action> is SUBS you may pass any single character that exists in the target encoding. If more than one character is PASSed, only the first character is used. If <action> is ESCAPE, then PASS one of the following:
  • C - specifies C style escaping (\uXXXX or \UXXXXXXXX)
  • STYLE - specifies CSS2 escaping (\XXXXXX)
  • JAVA - specifies Java escaping (\uXXXX)
  • UNICODE - specifies Unicode escaping {U+XXXXX}
  • DECIMAL - - specifies XML decimal escaping ($#XXXX)
  • X - specifies XML hex escaping (&#xXXXX)
<error text> if an error occured this field contains additional error text from the Unicode library (Optional). This must be PASSed with Share "Y" to return the value.

Examples

Given a Unicode string containing 'aɏb', here are the results with various options (the ɏ character does not exist in the ISO-8859-15 encoding).

Option Result
SKIP 'ab'
STOP 'a'. The field .TEXT FROM UNICODE contains ".UC_FROM_UCODE Fail" and <error> contains "U_INVALID_CHAR_FOUND"
SUBS 'a*b', assuming '*' was passed in <option> field as a substitute character.
ESCAPE 'a\u024Fb' if C is PASSed as <option>
ESCAPE 'a\24F b' if STYLE is PASSed as <option>
ESCAPE 'a\u024Fb' if JAVA is PASSed as <option>
ESCAPE 'a{U+024F}b' if UNICODE is PASSed as <option>
ESCAPE 'aɏb' if DECIMAL is PASSed as <option>
ESCAPE 'aɏb' if X is PASSed as <option>

Comments


-- Jean Neron - 2017-11-02

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r4 - 2018-09-05 - JeanNeron
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback