SQLite and native UNICODE LIKE support in C/C++
SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.
SQLite is the most widely deployed SQL database engine in the world. It is used in countless desktop computer applications as well as consumer electronic devices including cellphones, PDAs, and MP3 players. The source code for SQLite is in the public domain.
Please, rest assured that SQLite does indeed have UNICODE (UTF-8, UTF-16) support, but…
If you have searched and resulted in reading this post, then you have probably realized the true facts of life; the limitations of SQLite in regards to native non case-sensitive UNICODE text comparison and especially the use of the LIKE operator which has been crippled.
Traditionally non case-sensitive string comparison, would mean that, strings being compared would be transformed into lowercase before comparison, and then an incremental binary comparison loop wound be made for each byte constituting the strings to determine whether the strings were identical or not.
Unfortunately SQLite uses C/POSIX functions tolower() and toupper() to make string transformations and subsequently non case-sensitive string comparisons, which have no native UNICODE equivalents but instead are locale-specific.
As specified in MSDN library :
The case conversion of
tolower()is locale-specific. Only the characters relevant to the current locale are changed in case. The functions without the _l suffix use the currently set locale. The versions of these functions with the _l suffix take the locale as a parameter and use that instead of the currently set locale.
Therefore, a problem arises where locale-specific case mappings do not agree with UNICODE case mappings for each language family character.
ICU people have faithfully followed the UNICODE standard and produced open-source libraries, which have been also used by SQLite to optionally provide native UNICODE support. The disadvantage of this library is that it is too darn big ~10mb of compiled binary libraries to be used in association with SQLite. Even by omitting features I couldn’t get it to build to a sensible binary size
added the fact that i despise having dependencies in dynamically linked libraries that i have to ship as well.
Therefore i decided to extract the case folding tables from the UNICODE standard, similarly to what SQLite developers have done for lowercase mappings of the ASCII table, and implement the functionality in SQLite.
The following file may be build as a separate dynamic library or a static library to be compiled directly in your SQLite compilation.
It uses the already existing ICU infrastructure built in SQLite in order to unleash its power.
Build with SQLITE_CORE and SQLITE_ENABLE_ICU preprocessor definitions.
Hopefully there are *no significant* errors in the code, if you find one please leave a comment below.
Download the implementation file
sqlite3_unicode.zip
v5.1.0.4
Updated today to include a NOCASE collation override, for sorting unicode strings.
nocase collation does not always work corrrectly.
example:
sqlite> create table song (name text,id number);
sqlite> insert into song values(‘Paw’,1);
sqlite> insert into song values(‘Pallas’,2);
sqlite> select * from song;
Paw|1
Pallas|2
sqlite> select * from song order by name;
Pallas|2
Paw|1
sqlite> select * from song order by name collate nocase;
Paw|1
Pallas|2
this fixes the problem:
SQLITE_PRIVATE int sqlite3StrNICmp(const char *zLeft, const char *zRight, int N){
unsigned char *a = (unsigned char *)zLeft;
unsigned char *b = (unsigned char *)zRight;
signed int ua = 0, ub = 0;
while(N– && ua==ub)
{
ua = unicode_fold(sqlite3Utf8Read(a, 0, &a));
ub = unicode_fold(sqlite3Utf8Read(b, 0, &b));
if(!ua || !ub)
break;
}
return ua – ub;
}
and works even if Zleft and zRoght are 0 terminated and shorter than N.
Thank you for noticing.
I have updated the file that fixes the above issue and incremented the file to rev 7
Do LIKE and GLOB support indexed search optimization for Unicode as well?
Yes indexed search optimizations should work for Unicode as well as long as you don’t mask your column names with any functions (eg. unaccent(column_name) which overrides the optimization by default)
How to use it ?
I have AMALGAMATION version
Just add the file in your project and compile the project with the processor definitions below
SQLITE_CORE and SQLITE_ENABLE_ICU
That should be enough.
(I personally prefer to build SQLite in a separate project that produces a library file which is statically linked by my executable project)
I am far from an experienced C dev and therefore have some difficulties achieving this – could you give me a hint on how to build a new library of SQLite that I later can include in my projects? Just some general remarks, I’m not even sure if I can use configure/make for this or have to use gcc by hand.
Would be greatly appreciated!
Hi!
Gimme any url to downloading XRumer 5 FREE!
Thank you…
Very-very much.
I’m so need this program for advertise my online projects!!! This software is the best thing for online promo and mass posting, you know…
And, dont send me XRumer 2.9 and XRumer 3.0 – that versions are too old!
P.S. Google cant help me((((
Not that I’m impressed a lot, but this is a lot more than I expected when I found a link on Delicious telling that the info is awesome. Thanks.
Hi,
I’m trying to build a standalone .dll from
sqlite_unicode.*(I mean a dll file separate from the std sqlite3.dll) but I’m running into some problems.For instance, sqlite_unicode.c line 1861 contains integral constants greater than 0xffff but are declared as unsigned short. I wonder how I should cope with that.
Also it seems I need to use many preprocessor defines to build this baby. Which one should I define or is there a makefile available? I’m using Watcom C but it should be relatively easy to adapt one running makefile.
Last question: I really need to use this extension from both a scripting environment (I use AutoIt and there’s no problem here, I can load any dll and call whatever function I need to register it inside the std sqlite dll).
But I also need to have it working as well in a third-party database manager (I use SQLite Expert Pro). From this tool I can issue a Select sqlite3_load_extension(…) because the call is already enabled at startup, but I’d like this to internally link via sqlite_auto_extension(). Is it possible to use the “Windows hack” from the source code to do precisely that? The aim is to have the extension loaded, registered and initialized for any DB connection in the same session automagically.
Warm thanks in advance for any help.
Петербуржская Школа Правильного Питания (7 минут от метро “Чернышевская”) приглашает всех желающих избавиться от лишнего веса на бесплатные вечерние (18:45) ознакомительные занятия 15, 16, 18, 19 и 22 июня 2009г.
Более подробная информация и запись на сайте – http://hudeem-vmeste.com
Slightly embarrassed and with a quick smile to the bus driver, she reached behind her to unzi her skirt a little, thinking tha this would give her enough slack to raise hre leg.