Encoding of a More Comprehensive Character Set
As noted in the
Character Encoding section of the
Linguistic Considerations page,
the current encodings have been limited to those that may be viewed with
the Arial Unicode font, which is currently the single most
comprehensive Unicode font.
At the present time, the Arial Unicode font
does not cover additional characters that were added to the Unicode
Standard 2.1 to create the Unicode Standard 3.0, much less
later versions.
In future, encodings will be expanded to include characters that go beyond
the Unicode Standard 2.1.
In particular,
one area that would be enhanced by access to the full Unicode
Standard 3.0 is the encoding of Arabic; the Unicode
Standard 2.1 lacks certain special Arabic characters that the Editor
has had to replace with related, more common, characters.
Search Engine Handling of Diacritical Marks and Non-Latin
Characters
The Scholarly Societies Project currently does a reasonable job of
encoding Latin characters with diacritical marks, or
non-Latin characters, on web pages.
The Project does not yet allow the user:
- to copy a search string with Latin characters that have diacritical
marks
into the Search Engine search box, and have the Search Engine
interpret the request correctly, nor
- to copy a search string of non-Latin characters into the
Search
Engine search
box, and have the search engine interpret the request correctly.
In general these are more difficult problems to solve than the
encoding problem.
The problem of allowing Latin characters with diacritical marks in
a search string is likely to be the more tractable of the
two problems;
it is hoped that this will be solved within the next several months, as we
move to a more powerful production system.
The problem of allowing non-Latin characters in a search string is
likely to take rather longer.
|