.TEXT FROM UNICODE
This subroutine transcodes a Unicode alpha field into a RAW alpha field. Added in 6.0.0
Usage
PASS <raw> FIELD SHARE? Y
PASS <unicode_source> FIELD SHARE? N
PASS <encoding> FIELD SHARE? N
PASS <action> FIELD SHARE? N
PASS <option> FIELD SHARE? N
PASS <error_text> FIELD SHARE? Y
GOSUB --- .TEXT FROM UNICODE
* Check for errors
IF --- .TEXT FROM UNICODE NE
Description
This subroutine will transcode a Unicode alpha field to a RAW alpha field with error handling for the characters that cannot be transcoded. The first 2 parameters are required. If any required parameters are missing, the subroutine will CANCEL.
<raw> is the RAW alpha field to contain the transcoded string (Required). This must be PASSed with Share "Y" to return the value.
This must be a RAW field type, otherwise the transcoding will fail.
<unicode_source> is the field containing the Unicode string to be transcoded (Required).
<encoding> is the encoding to use in the RAW target field. If not specified or a blank is PASSed, it will check the environment variable APPX_RAW_ENCODING and use that encoding. If that is not present, it will default to ISO-8859-15.
<action> is the action to be taken if a character cannot be transcoded to RAW (Optional). If not PASSed or blank is PASSed, defaults to SKIP.
- SKIP - skip the offending character, it will not be transcoded to the target field but the remaining characters will be.
- STOP - stop transcoding and return an error
- SUBS - Substitute the character in the <option> field for the offending character
- ESCAPE - Escape the character according to the convention specified in the <option> field
<option> contains either the substitute character to use if <action> is SUBS or the convention to use if <action> is ESCAPE. This is required if <action> is SUBS or ESCAPE. If <action> is SUBS you may pass any single character that exists in the target encoding. If more than one character is PASSed, only the first character is used. If <action> is ESCAPE, then PASS one of the following:
- C - specifies C style escaping (\uXXXX or \UXXXXXXXX)
- STYLE - specifies CSS2 escaping (\XXXXXX)
- JAVA - specifies Java escaping (\uXXXX)
- UNICODE - specifies Unicode escaping {U+XXXXX}
- DECIMAL - - specifies XML decimal escaping ($#XXXX)
- X - specifies XML hex escaping (&#xXXXX)
<error text> if an error occured this field contains additional error text from the Unicode library (Optional). This must be PASSed with Share "Y" to return the value.
Note: This subroutine is designed to operate on Alpha, Text, and Token fields only. The returned results are undefined if you specify any other type of field.
Examples
Given a Unicode string containing 'aɏb', here are the results with various options (the ɏ character does not exist in the ISO-8859-15 encoding).
Option |
Result |
SKIP |
'ab' |
STOP |
'a'. The field .TEXT FROM UNICODE contains ".UC_FROM_UCODE Fail" and <error> contains "U_INVALID_CHAR_FOUND" |
SUBS |
'a*b', assuming '*' was passed in <option> field as a substitute character. |
ESCAPE |
'a\u024Fb' if C is PASSed as <option> |
ESCAPE |
'a\24F b' if STYLE is PASSed as <option> |
ESCAPE |
'a\u024Fb' if JAVA is PASSed as <option> |
ESCAPE |
'a{U+024F}b' if UNICODE is PASSed as <option> |
ESCAPE |
'aɏb' if DECIMAL is PASSed as <option> |
ESCAPE |
'aɏb' if X is PASSed as <option> |
Comments
--
Jean Neron - 2017-11-02