SET Statement
Overview
The SET statement has been extended to take the Encoding Type attribute of an alpha field into consideration. All Alpha fields in existing applications developed prior to the implementation of Unicode support have an Encoding Type of “Raw”. Group fields also have an implied Encoding Type of “Raw”. To ensure compatibility of existing applications, the behavior of the SET statement should be the same as it always has been when the statement involves Raw or Group fields and the other field types.
The following outlines specifies the behavior of the SET statement when one or both of the operands have an Encoding Type value of “NATIONAL” or “UNICODE”.
SET from RAW Alpha:
- To RAW Alpha or Group
- The current behavior has not been changed.
- To UNICODE or NATIONAL Alpha
- Each byte of data in the RAW alpha source field is transcoded from an 8859-15 character for RAW or the defined NATION encoding to the equivalent UTF-32 character in the range U+0000 thru U+00FF and is then set into the corresponding character position in the UTF-32 encoded destination field.
- The destination is truncated if the number of source bytes is greater than the number of destination characters.
- UTF-32 padding takes place if the number of source bytes is less than the number of destination characters.
Set from Group:
- To RAW Alpha or Group
- he current behavior has not been changed.
- To UNICODE or NATIONAL Alpha
- This combination will not be allowed.
- The process compiler should report this as an error.
- Since many PDF fields and TEMP fields are redefined as UNICODE this will likely break some applications.
Set from UNICODE or NATIONAL Alpha:
- To RAW or NATIONAL Alpha
- Each UTF-32 character in the source field is transcoded into 8859-15 characters for RAW or the defined NATIONAL encoding in the range 0x00 to 0xFF and is then set into the corresponding byte postiion in the destination field.
- The destination is truncated if the number of source characters is greater than the number of destination bytes.
- 8-bit space padding takes place if the number of source characters is less than the number of destination bytes.
- To Group:
- This combination will not be allowed.
- The process compiler should report this as an error.
- Since many PDF fields and TEMP fields are redefined as UNICODE this will like break some applications.
- To UNICODE Alpha:
- Straight copy with no transcoding.
- Expected truncation and UTF-32 padding will take place on length mismatch.
Set from Unicode Literal:
- To RAW Alpha or Group
- Normal literals unchanged
- Unicode Literals are not interpreted as unicode escape sequences.
- To UNICODE or NATIONAL Alpha
- Character literals are limited to the defined 8859-15 characters.
- Each character is transcoded from 8859-15 to UTF-32.
- Unicode characters can be embedded in literals using unicode escape sequences. \u#### or \U########
- Truncation and UTF-32 or Space padding will take place.
Test Plan
- Test invalid combinations (Group -> Unicode/National, Unicode/National -> Group), is error message understandable?
- For each of the types (Date, Format, Logic, Numeric, Text, Token), perform 4 tests: move from type into Unicode, move from type into National, move from Unicode into type, move from National into type.
- Test Raw to National and back (with/without transcode error)
- Test Raw to Unicode and back (with/without transcode error)
- Test Unicode to Unicode
- Test National to National
- Test Unicode to National and back (with/without transcode error)
- Test Truncation & padding
Bugs
- Bug: Setting a Unicode alpha field equal to a Token field does not work. It appears to do a byte by byte set, not a character set of the Token string. (Note CNV TEXT does work) * FIXED * March 28 engine
- When SETting a logic field into Unicode or National fields, does not remove old data in receiving field, ie, receiving field contains 20110412, after SET contains Y0110412. * FIXED * April 17 engine
- When SETting a numeric field into Unicode or National fields, does not remove old data in receiving field, ie, receiving field contains 2011041200000000, after SETting numeric field value 12345, contains 1234541200000000.** FIXED ** April 17 engine
- SETing a token field into a National Field does not work, just get reverse image diamonds * FIXED ** April 17 engine
- SETing a National field back into a Token does not work, just get ??????? ** Not a bug FAD**
- SETing a Unicode field back into a Token does not work, just get ??????? ** Not a bug FAD**
- Does not throw transcode error when moving raw to National when raw contains non national data ** Not a bug FAD**
- A Raw alpha containing \u#### or \U######## does not get transcoded to Unicode. ** Not a bug, functions as designed **
- SETing a unicode character into a raw alpha generates a 'Value Does Not Fit In Mask' runtime error. It should be transcode error. * FIXED * April 19 engine
- SETing a national character into a raw alpha generates a 'Value Does Not Fit In Mask' runtime error. Should not be an error at all, should it? Raw should accept any 8 bit value, right? * Not a bug, FAD *. It's possible a National character will not exist in the 8859-15 (raw) character set.
- SETing a Unicode Character into a National field, where the Unicode character is not part of the National set does not create a transcode error. ** Not a bug FAD**
- SET statement gives runtime error 'Value Does Not Fit in Mask' when attempting to move unicode field containing \u00A4 into it. I thought any 8 bit character was valid for Raw fields? * Not a bug * Raw fields use the 8859-15 character set, and \u00A4 (spiky ball) is not part of it.
- Allows me to SET \u00A4 into a National field when SYSPARM has US-ASCII as National Character set. This character (spiky ball) does not exist in US-ASCII, only in 8859-1. Should be a transcode error. ** Not a bug FAD**
--
SteveFrizzell - 2011-03-07