In classic Lotus Domino Web development (without XPages) using non-7 bit-ASCII characters in field names can lead into problems, because Domino encodes such names into cryptic strings inside the HTML page. In some cases, i.e. for client side validation of fields using JavaScript in the browser, it is nearly impossible to get the original fields name off the encoded one. The easiest way would be to pre-calculate the encoded names in the Domino back-end and generate the needed JavaScript code before the HTML is send to the browser.
But there are two problems:
The first problem is, there is no function
available to encode data in the same way that Domino uses during generation
of the HTML. The only way to decode such a string is using the "@URLDecode"
function, which isn't available in JavaScript in the browser.
The other problem is that the encoding
algorithm seems not to be documented at all, which makes it quite hard
to implement an appropriate encoding function yourself. Well, there is
a light at the end of the tunnel and I did a little research on this. The
following analysis shows how this encoding works:
When using the field name "Straße"
(eng.: street, the "ß" is a sharp "s" in german) the
Domino server encodes this into "_gadq74of1ck_". When
removing the underscores and trying various examples, the conclusion is
that it must be some kind of Base32 encoding, but the results of various
Base32 variants don't match the Domino one:
"Straße" -> Domino ->
"gadq74of1ck"
"Straße" -> Base32 ->
"kn2heyo7mu"
"Straße" -> zBase32 -> "un7rzo75fd"
"Straße" -> Base32hex -> "adq74ogvck"
Looking a little closer at the result,
there is a partial match on the Base32hex (RFC 4648 (1)) encoding, but
there are some differences:
Notes: gadq74of1ck
Base32hex: adq74ogvck
The Domino encoded value has an extra
character and doesn't match the Base32 result in all characters, to look
a little closer at the cause of these differences, it is needed to decode
the Domino generated string step by step. For now, we skip the first character.
It'll be explained later on:
1: encoded field name
2: translation into the decimal values
according to the Base32hex table
3: binary values, 5 bit per character
4: regrouping of the bits
5: groups of 8 bit per character
6: converting into characters
1: a d q
7 4 o f
1 c k
2: 10 13 26
7 4 24 15
1 12 20
3: 01010 01101 11010 00111 00100 11000 01111 00001
01100 10100
4: 01010011011101000111001001100001111000010110010100
5: 01010011 01110100 01110010 01100001 11100001 01100101
6: S
t r a
á e
The difference between "f1"
and "gv" is resulting from another character being used
instead of the "ß". The conclusion is that Domino uses
another character set to encode the field names; a little research on common
charsets brings the MS-DOS Codepage 850 (2) to the eye, indeed in this
charset and its relatives, the "ß" matches the decimal
value of "225". This however doesn't make it not easier,
most likely the used code page varies depending on the systems configuration
so developing a fully compatible encoding function might be a challenge.
Now the explanation of the above mentioned
first character in the Domino encoding, this one is a little tricky, too.
First we need to decode it again using Base32hex:
g
16
10000
The additional 5 Bits are Parity Bits
(Even Parity) and are used to validate the encoding to prevent errors,
to make this more clear we use a different display order of the results:
g adq74of1ck
1 0010010001
0 1110011010
0 0101101011
0 1011001000
0 0101001100
Well that's it, one less mystery out
there
(1) http://www.rfc-editor.org/rfc/rfc4648.txt
(2) ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP850.TXT