IF / AND / OR Statements
Overview
The GE/GT/LE/LT operators should use the Unicode collation API when UTF-32 operands are involved. This is because the 32-bit values that represent UTF-32 characters may not necessarily correspond to the expected collating sequence.
The RI/RS operators expect that the operands will be 8-bit RAW Alpha characters. If UTF-32 operands are specified in conjunction with these operators, they will need to be transcoded to the 8859-1 encoding used by a RAW Alpha field before the statement can be executed. If any character values are encountered that cannot be represented by an 8859-1 encoding, a data exception should occur.
NOTE: The RI and RS operators do not function as described in the previous paragraph. Instead, the pattern and the search string are both converted to UTF-32 (if necessary) and the regular expression is evaluated in Unicode form (KAD).
The following applies to all operators other than RS/RI:
- RAW alpha or Group compared with RAW Alpha, Group, or Literal:
- This is the existing condition and no charges have been made.
- UNICODE or NATIONAL alpha compared with UNICODE or NATIONAL alpha:
- If there is a mismatch in field lengths then the shorter is padded with UTF-32 space to match the length of the longer field.
- The operation is performed by comparing 4-byte UTF-32 characters.
- UNICODE or NATIONAL alpha compared with RAW alpha (or RAW compared to UNICODE or NATIONAL):
- The RAW operand is transcoded to UTF-32 characters.
- If there is a mismatch in field lengths then the shorter is padded with UTF-32 space to match the length of the longer field.
- The operation is performed by comparing 4-byte UTF-32 characters.
- UNICODE or NATIONAL alpha compared with a Literal
- The literal operand is considered to be a RAW alpha so transcoding from 8859-15 to UTF-32 will take place before the comparison.
- If the literal contains unicode escape sequences (\u#### or \U########) they will be honored.
- The literal "_" will still be honored as a way to search for a trailing space.
- If there is a mismatch in field lengths then the shorter is padded with UTF-32 space to match the length of the longer field.
- The operation is performed by comparing 4-byte UTF-32 characters.
- UNICODE or NATIONAL alpha compared with a Group field (or Group compared with UNICODE or NATIONAL)
- This combination is not supported.
- The process compiler should report this as an error.
- Since many --- Alphafields are redefined to be UNICODE this will likely break some applications.
Test Plan
- Test if the ILF Editor allows all valid combinates from the above list
- Test if the ILF Editor give a meaningful error when entering invalid combinations
- Test if the Process Compiler allows all valid combinates from the above list
- Test if the Process Compiler give a meaningful error when encountering invalid combinations.
- Test RI/RS if they give transcode error when transcoding UTF-8 to RAW
- Test Raw to Raw comparisons to ensure they aren't affected
- Test Unicode/National to Group and vice versa for compile errors
- Test Unicode to Unicode comparisons
- Test National to National comparisons
- Test Unicode to National comparisons
- Test National to Unicode comparisons
- Test Unicode to Raw
- Test Raw to Unicode
- Test National to Raw
- Test Raw to National
- Test Unicode to Literal (\uxxx, \UXXXXXXXX, normal)
Bugs
- The EX operator gives the same result as IN, ie, IF ... IN ... and IF ... EX .... both return true when used with Unicode fields. Use 'testing' for LHS and 'ing' for RHS. Seems to work ok with single character operands.* FIXED * April 5 engine
- The EX operator gives the same result as IN, ie, IF ... IN ... and IF ... EX .... both return true when used with National fields. Use 'testing' for LHS and 'ing' for RHS. Seems to work ok with single character operands.* FIXED * April 5 engine
- Compiler does not flag Group field to Unicode comparison as an error * FIXED * April 5 engine
- Compiler does not flag Group field to National comparison as an error * FIXED * April 5 engine
- Executing an IF .. RS/RI with Unicode or National fields causes process to abort. 2nd attempt causes session to crash. *FIXED * April 5 engine
- Not sure if this is a bug or not: If I compare 2 raw alphas, then 'a' is GT 'A', as per the ASCII table. If I compare raw 'a' to Unicode 'A', then the opposite is true. It this because the raw was transcoded to unicode, and then the Unicode collating sequence was used, which puts the lower case before the upper case?
- IF .. RS/RI .. always returns False. Example, IF Testing RI ING returns T under 4.2.a, F under unicode. Tested with Unicode and Raw fields, same result. * FIXED * Tested ok with April 24 engine.
- Does not give transcode error as expected. The specs say National/Unicode fields should be transcoded down to 8859-1 first. Or does this mean Regular Expressions will work with Unicode, and don't have to be transcoded? ** Correct, RS/RI should work with Unicode fields (but don't, see #7 above).