<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ioannis blog &#187; database</title>
	<atom:link href="http://ioannis.mpsounds.net/blog/tag/database/feed/" rel="self" type="application/rss+xml" />
	<link>http://ioannis.mpsounds.net/blog</link>
	<description></description>
	<lastBuildDate>Fri, 10 Sep 2010 15:00:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>SQLite and native UNICODE LIKE support in C/C++</title>
		<link>http://ioannis.mpsounds.net/blog/2007/12/19/sqlite-native-unicode-like-support/</link>
		<comments>http://ioannis.mpsounds.net/blog/2007/12/19/sqlite-native-unicode-like-support/#comments</comments>
		<pubDate>Wed, 19 Dec 2007 17:22:31 +0000</pubDate>
		<dc:creator>ioannis</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[engine]]></category>
		<category><![CDATA[LIKE]]></category>
		<category><![CDATA[lower]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[SQLite]]></category>
		<category><![CDATA[unicode]]></category>
		<category><![CDATA[upper]]></category>
		<category><![CDATA[utf-16]]></category>
		<category><![CDATA[utf-8]]></category>

		<guid isPermaLink="false">http://ioannis.mpsounds.net/blog/2007/12/19/sqlite3-and-native-unicode-support-in-cc/</guid>
		<description><![CDATA[SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine. SQLite is the most widely deployed SQL database engine in the world. It is used in countless desktop computer applications as well as consumer electronic devices including cellphones, PDAs, and MP3 players. The source code for SQLite is in the [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.</p>
<p>SQLite is the <a href="http://www.sqlite.org/mostdeployed.html">most widely deployed</a> SQL database engine in the world. It is used in countless desktop computer applications as well as consumer electronic devices including cellphones, PDAs, and MP3 players. The source code for SQLite is in the <a href="http://www.sqlite.org/copyright.html">public domain</a>.</p></blockquote>
<p>Please, rest assured that SQLite does indeed have UNICODE (UTF-8, UTF-16) support, but&#8230;</p>
<p>If you have searched and resulted in reading this post, then you have probably realized the true facts of life; the limitations of SQLite in regards to native non case-sensitive UNICODE text comparison and especially the use of the LIKE operator which has been crippled.<br />
<span id="more-48"></span><br />
Traditionally non case-sensitive string comparison, would mean that, strings being compared would be transformed into lowercase before comparison, and then an incremental binary comparison loop wound be made for each byte constituting the strings to determine whether the strings were identical or not.</p>
<p>Unfortunately SQLite uses C/POSIX functions <code>tolower()</code> and <code>toupper()</code> to make string transformations and subsequently non case-sensitive string comparisons, which have no native UNICODE equivalents but instead are locale-specific.</p>
<p>As specified in MSDN library :</p>
<blockquote><p>The case conversion of <code>tolower()</code> is locale-specific. Only the characters relevant to the current locale are changed in case. The functions without the _l suffix use the currently set locale. The versions of these functions with the _l suffix take the locale as a parameter and use that instead of the currently set locale.</p></blockquote>
<p>Therefore, a problem arises where locale-specific case mappings do not agree with UNICODE case mappings for each language family character.</p>
<p><a href="http://www.icu-project.org">ICU</a> people have faithfully followed the <a href="http://www.unicode.org">UNICODE</a> standard and produced open-source libraries, which have been also used by SQLite to <em>optionally</em> provide native UNICODE support. The disadvantage of this library is that it is too darn big ~10mb of compiled binary libraries to be used in association with SQLite. Even by omitting features I couldn&#8217;t get it to build to a sensible binary size <img src="http://ioannis.mpsounds.net/blog/wp-includes/images/smilies/icon_neutral.gif" alt="|" class="wp-smiley" />  added the fact that i despise having dependencies in dynamically linked libraries that i have to ship as well.</p>
<p>Therefore i decided to extract the case folding tables from the UNICODE standard, similarly to what SQLite developers have done for lowercase mappings of the ASCII table, and implement the functionality in SQLite.</p>
<p>The following file may be build as a separate dynamic library or a static library to be compiled directly in your SQLite compilation.<br />
It uses the already existing ICU infrastructure built in SQLite in order to unleash its power.<br />
Build with <strong>SQLITE_CORE</strong> and <strong>SQLITE_ENABLE_ICU</strong> preprocessor definitions.</p>
<p>Hopefully there are *no significant* errors in the code, if you find one please leave a comment below.</p>
<blockquote><p>Download the implementation file <a href="?dl=sqlite3_unicode.zip" title="(5596 downloads)"><strong><code>sqlite3_unicode.zip</code></strong></a></p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://ioannis.mpsounds.net/blog/2007/12/19/sqlite-native-unicode-like-support/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
	</channel>
</rss>
