IF / AND / OR Statements

Overview

The GE/GT/LE/LT operators should use the Unicode collation API when UTF-32 operands are involved. This is because the 32-bit values that represent UTF-32 characters may not necessarily correspond to the expected collating sequence.

The RI/RS operators expect that the operands will be 8-bit RAW Alpha characters. If UTF-32 operands are specified in conjunction with these operators, they will need to be transcoded to the 8859-1 encoding used by a RAW Alpha field before the statement can be executed. If any character values are encountered that cannot be represented by an 8859-1 encoding, a data exception should occur.

The following applies to all operators other than RS/RI:

  • RAW alpha or Group compared with RAW Alpha, Group, or Literal:
    • This is the existing condition and no charges have been made.
  • UNICODE or NATIONAL alpha compared with UNICODE or NATIONAL alpha:
    • If there is a mismatch in field lengths then the shorter is padded with UTF-32 space to match the length of the longer field.
    • The operation is performed by comparing 4-byte UTF-32 characters.
  • UNICODE or NATIONAL alpha compared with RAW alpha (or RAW compared to UNICODE or NATIONAL):
    • The RAW operand is transcoded to UTF-32 characters.
    • If there is a mismatch in field lengths then the shorter is padded with UTF-32 space to match the length of the longer field.
    • The operation is performed by comparing 4-byte UTF-32 characters.
  • UNICODE or NATIONAL alpha compared with a Literal
    • The literal operand is considered to be a RAW alpha so transcoding from 8859-15 to UTF-32 will take place before the comparison.
    • If the literal contains unicode escape sequences (\u#### or \U########) they will be honored.
    • The literal "_" will still be honored as a way to search for a trailing space.
    • If there is a mismatch in field lengths then the shorter is padded with UTF-32 space to match the length of the longer field.
    • The operation is performed by comparing 4-byte UTF-32 characters.
  • UNICODE or NATIONAL alpha compared with a Group field (or Group compared with UNICODE or NATIONAL)
    • This combination is not supported.
    • The process compiler should report this as an error.
    • Since many --- Alphafields are redefined to be UNICODE this will likely break some applications.

Test Plan

  1. Test if the ILF Editor allows all valid combinates from the above list
  2. Test if the ILF Editor give a meaningful error when entering invalid combinations
  3. Test if the Process Compiler allows all valid combinates from the above list
  4. Test if the Process Compiler give a meaningful error when encountering invalid combinations.
  5. Test RI/RS if they give transcode error when transcoding UTF-8 to RAW
  6. Test Raw to Raw comparisons to ensure they aren't affected
  7. Test Unicode/National to Group and vice versa for compile errors
  8. Test Unicode to Unicode comparisons
  9. Test National to National comparisons
  10. Test Unicode to National comparisons
  11. Test National to Unicode comparisons
  12. Test Unicode to Raw
  13. Test Raw to Unicode
  14. Test National to Raw
  15. Test Raw to National
  16. Test Unicode to Literal (\uxxx, \UXXXXXXXX, normal)

Bugs

  1. The EX operator gives the same result as IN, ie, IF ... IN ... and IF ... EX .... both return true when used with Unicode fields. Use 'testing' for LHS and 'ing' for RHS. Seems to work ok with single character operands.
  2. The EX operator gives the same result as IN, ie, IF ... IN ... and IF ... EX .... both return true when used with National fields. Use 'testing' for LHS and 'ing' for RHS. Seems to work ok with single character operands.
  3. Compiler does not flag Group field to Unicode comparison as an error
  4. Compiler does not flag Group field to National comparison as an error
  5. Executing an IF .. RS/RI with Unicode or National fields causes process to abort. 2nd attempt causes session to crash.
  6. When comparing Unicode character 'Ó' (U+04E7) to National string 'ing', GE returns True.
  7. Not sure if this is a bug or not: If I compare 2 raw alphas, then 'a' is GT 'A', as per the ASCII table. If I compare raw 'a' to Unicode 'A', then the opposite is true. It this because the raw was transcoded to unicode, and then the Unicode collating sequence was used, which puts the lower case before the upper case?
  8. Unicode 'B' is GE raw 'A'. Wrong, it's GT, but not GE. Same problem if you reverse Unicode & Raw operands.
  9. Unicode 'A' is LE raw 'B'. Wrong, it's LT, but not LE. Same problem if you reverse Unicode & Raw operands.
  10. National Fields have the same problems as 8 & 9, report LE/GE as true when it's only LT/GT.
  11. Unicode to Literal, same as 8,9,10, reports LE/GE when should just be GT/LT. Comparing literal \u03A3 to U+0385, or comparing U+03a3 to 'Testing'.

Edit | Attach | Watch | Print version | History: r17 | r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r8 - 2011-03-24 - JeanNeron
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback