Tags:
create new tag
view all tags

Character Encoding

Character encoding wasn’t much of an issue in the pre-internet and pre-PDF days. This is because the first 128 characters in most encoding schemas are the same, and there wasn’t much need for characters beyond that threshold. Today, that scenario has changed. With the introduction of the Euro symbol and exporting XML data out of Appx, it requires you to consider the encoding used amongst the various components of your Appx environment.


Appx has provided various ways for you to define specific character sets used by your system. Future releases of Appx will address this issue more succinctly and improve overall synchronization of character encoding. Presently Appx has established system defaults where applicable. This paper will explain the components as well as how to override these default values.


Encoding today is relevant in several distinct areas:


Appx desktop client (Java)

The Appx clients’ default character set is Windows-1252. The Appx engine on the other hand uses ISO-8859-15. You might think that the client default would be a problem; however that is not the case. The following example will illustrate why:

Entering a Euro symbol on a windows java desktop connected to a Linux server running Appx 5.4.2.

From the AppxDesktopClient, Windows-1252 (default) encoding, you enter a Euro symbol, € (alt+0128) into a field and save the record. The associated number value of the Euro symbol for the 1252 schema is 128, see Windows-1252 table. The Appx engine, set to use ISO-8859-15 (default), stores the character value 128. Note that a 128 in the ISO-8859-15 table is not a Euro symbol. Even so, the Euro symbol is displayed in your Appx Application field. That’s because your desktop is using the 1252 character set. If you looked at this same record on another desktop using a different character encoding schema, the character would be displayed using whatever symbol was associated with a value of 128 (probably not the Euro symbol).

To correct this you would simply set the second client to windows-1252 and start a new session. It’s important to understand that the data on the server is not converted, but rather, the raw character value associated with the client encoding schema is maintained.

Taking this example one step further, let’s say your Appx application was importing data directly into your application. If the character encoding of that data is different from that of your Appx client you might not see a true representation of that data when using the client. Keep in mind that the raw character encoding value is saved in your Appx file.

Recommendation: All AppxDesktopClients that connect to your Appx engine should be using the same character encoding.

HTML Client

The HTML client displays all data in UTF-8 encoding. The engine transcodes from it's default encoding of ISO-8859-15 to UTF-8 when sending data to the client, and from UTF-8 to ISO-8859-15 when receiving data from the client. If APPX_RAW_ENCODING is specified, it uses that instead of ISO-8859-15.

Appx Engine

In all Appx releases prior to 6.0 the Appx engine is a single byte processor and, by default, is set to use the ISO-8859-15 character set. Multi-byte characters from another source will be saved in Appx as raw data byte by byte. The programmer must accommodate for the potential character identification.


PDF output

PDF encoding is specified in the PDFLib used by the PDF processor. In release 5.4.2 and higher Appx determines the OS type and passes a parameter to set the PDFLib to either Wndows-1252 or ISO-8859-15 (Linux).

This behavior of setting the PDFLib based on the OS type is being changed. Future releases of Appx will set the PDFLib to the encoding specified for the engine.


Importing/Exporting raw data


Incoming data in your files might not be the same as the engine schema (refer to the example above). Exported data will be in the format of whatever was used to create it. In the case of data entered using the Appx client the data would Windows-1252, unless the client was set to something else. It does not matter what character set is defined for the engine as no conversion is performed. Changing the coding schema of the engine does not affect the data stored in Appx files.



Web applications using CGI

Incoming data from a web page is formatted as UTF-8. This data is not converted. The programmer must accommodate for special characters.

Importing/Exporting CSV data using the Database Management utilities

The Export/Import CSV utilities are new in release 5.3.0. Export will convert the data from the character set defined by the Appx engine to UTF-8. Conversely, imported data, which must be encoded using UTF-8, will be converted to the engine character set.


Appx Data Exchange Application (0DX)

The Data Exchange Application, released in 5.3, is used to import XML data. This data is converted and stored in Appx as ISO-8859-15, unless overridden by APPX_RAW_ENCODING.

Administrative controls

Appx recommends that you synchronize all your character encoding settings based on your specific data requirements using the following variables;

AppxDesktopClient

Select the Options tab then click Advanced and change the characterEncoding under [Options]. Choose the encoding schema from the drop down list.

Appx 5.4.2 Engine

APPX_RAW_ENCODING=(encoding schema name)

PDF Output

APPX_PDF_ENCODING =(encoding schema name) Note: the PDF encoding schema name may have a different spelling than either of the other schemas mentioned above.

Comments

-- Jean Neron - 2016-04-06

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r3 - 2016-06-24 - JeanNeron
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback