IF / AND / OR Statements

IF / AND / OR Statements
- Overview
  - Test Plan
  - Bugs

Overview

The GE/GT/LE/LT operators should use the Unicode collation API when UTF-32 operands are involved. This is because the 32-bit values that represent UTF-32 characters may not necessarily correspond to the expected collating sequence.

The RI/RS operators expect that the operands will be 8-bit RAW Alpha characters. If UTF-32 operands are specified in conjunction with these operators, they will need to be transcoded to the 8859-1 encoding used by a RAW Alpha field before the statement can be executed. If any character values are encountered that cannot be represented by an 8859-1 encoding, a data exception should occur.

The following applies to all operators other than RS/RI:

RAW alpha or Group compared with RAW Alpha, Group, or Literal:
- This is the existing condition and no charges have been made.
UNICODE or NATIONAL alpha compared with UNICODE or NATIONAL alpha:
- If there is a mismatch in field lengths then the shorter is padded with UTF-32 space to match the length of the longer field.
- The operation is performed by comparing 4-byte UTF-32 characters.
UNICODE or NATIONAL alpha compared with RAW alpha (or RAW compared to UNICODE or NATIONAL):
- The RAW operand is transcoded to UTF-32 characters.
- If there is a mismatch in field lengths then the shorter is padded with UTF-32 space to match the length of the longer field.
- The operation is performed by comparing 4-byte UTF-32 characters.
UNICODE or NATIONAL alpha compared with a Literal
- The literal operand is considered to be a RAW alpha so transcoding from 8859-15 to UTF-32 will take place before the comparison.
- If the literal contains unicode escape sequences (\u#### or \U########) they will be honored.
- The literal "_" will still be honored as a way to search for a trailing space.
- If there is a mismatch in field lengths then the shorter is padded with UTF-32 space to match the length of the longer field.
- The operation is performed by comparing 4-byte UTF-32 characters.
UNICODE or NATIONAL alpha compared with a Group field (or Group compared with UNICODE or NATIONAL)
- This combination is not supported.
- The process compiler should report this as an error.
- Since many --- Alphafields are redefined to be UNICODE this will likely break some applications.

Test Plan

Test if the ILF Editor allows all valid combinates from the above list
Test if the ILF Editor give a meaningful error when entering invalid combinations
Test if the Process Compiler allows all valid combinates from the above list
Test if the Process Compiler give a meaningful error when encountering invalid combinations.
Test RI/RS if they give transcode error when transcoding UTF-8 to RAW
Test Raw to Raw comparisons to ensure they aren't affected
Test Unicode/National to Group and vice versa for compile errors
Test Unicode to Unicode comparisons
Test National to National comparisons
Test Unicode to National comparisons
Test National to Unicode comparisons
Test Unicode to Raw
Test Raw to Unicode
Test National to Raw
Test Raw to National
Test Unicode to Literal (\uxxx, \UXXXXXXXX, normal)

Bugs

The EX operator gives the same result as IN, ie, IF ... IN ... and IF ... EX .... both return true when used with Unicode fields. Use 'testing' for LHS and 'ing' for RHS. Seems to work ok with single character operands.
The EX operator gives the same result as IN, ie, IF ... IN ... and IF ... EX .... both return true when used with National fields. Use 'testing' for LHS and 'ing' for RHS. Seems to work ok with single character operands.
Compiler does not flag Group field to Unicode comparison as an error
Compiler does not flag Group field to National comparison as an error
Executing an IF .. RS/RI with Unicode or National fields causes process to abort. 2nd attempt causes session to crash.
When comparing Unicode character 'Ó' (U+04E7) to National string 'ing', GE returns True.
Not sure if this is a bug or not: If I compare 2 raw alphas, then 'a' is GT 'A', as per the ASCII table. If I compare raw 'a' to Unicode 'A', then the opposite is true. It this because the raw was transcoded to unicode, and then the Unicode collating sequence was used, which puts the lower case before the upper case?
Unicode 'B' is GE raw 'A'. Wrong, it's GT, but not GE. Same problem if you reverse Unicode & Raw operands.
Unicode 'A' is LE raw 'B'. Wrong, it's LT, but not LE. Same problem if you reverse Unicode & Raw operands.
National Fields have the same problems as 8 & 9, report LE/GE as true when it's only LT/GT.
Unicode to Literal, same as 8,9,10, reports LE/GE when should just be GT/LT. Comparing literal \u03A3 to U+0385, or comparing U+03a3 to 'Testing'.

Topic revision: r8 - 2011-03-24 - JeanNeron

~~Edit~~
~~Attach~~