Difference: UnicodeSETStmt (3 vs. 4)

Revision 42011-03-15 - PeteBrower

Line: 1 to 1
 
META TOPICPARENT name="UnicodeTestPlan"

SET Statement

Line: 14 to 13
 
  • To RAW Alpha or Group
    • The current behavior has not been changed.
  • To UNICODE or NATIONAL Alpha
Changed:
<
<
    • Each byte of data in the RAW alpha source field is transcoded from an 8859-15 character to the equivalent UTF-32 character in the range U+0000 thru U+00FF and is then set into the corresponding character position in the UTF-32 encoded destination field.
>
>
    • Each byte of data in the RAW alpha source field is transcoded from an 8859-15 character for RAW or the defined NATION encoding to the equivalent UTF-32 character in the range U+0000 thru U+00FF and is then set into the corresponding character position in the UTF-32 encoded destination field.
    • The destination is truncated if the number of source bytes is greater than the number of destination characters.
    • UTF-32 padding takes place if the number of source bytes is less than the number of destination characters.

Set from Group:

  • To RAW Alpha or Group
    • he current behavior has not been changed.
  • To UNICODE or NATIONAL Alpha
    • This combination will not be allowed.
    • The process compiler should report this as an error.
    • Since many PDF fields and TEMP fields are redefined as UNICODE this will likely break some applications.

Set from UNICODE or NATIONAL Alpha:

  • To RAW or NATIONAL Alpha
    • Each UTF-32 character in the source field is transcoded into 8859-15 characters for RAW or the defined NATIONAL encoding in the range 0x00 to 0xFF and is then set into the corresponding byte postiion in the destination field.
    • The destination is truncated if the number of source characters is greater than the number of destination bytes.
    • 8-bit space padding takes place if the number of source characters is less than the number of destination bytes.
  • To Group:
    • This combination will not be allowed.
    • The process compiler should report this as an error.
    • Since many PDF fields and TEMP fields are redefined as UNICODE this will like break some applications.
  • To UNICODE Alpha:
    • Straight copy with no transcoding.
    • Expected truncation and UTF-32 padding will take place on length mismatch.

Set from Unicode Literal:

  • To RAW Alpha or Group
    • Normal literals unchanged
    • Unicode Literals are not interpreted as unicode escape sequences.
  • To UNICODE or NATIONAL Alpha
    • Character literals are limited to the defined 8859-15 characters.
    • Each character is transcoded from 8859-15 to UTF-32.
    • Unicode characters can be embedded in literals using unicode escape sequences. \u#### or \U########
    • Truncation and UTF-32 or Space padding will take place.
 

Bugs

  1. Bug: Setting a Unicode alpha field equal to a Token field does not work. It appears to do a byte by byte set, not a character set of the Token string. (Note CNV TEXT does work)
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback