I’m excited to see the first stable release of KBibTeX for KDE Frameworks 5. I know it’s been a long time coming.
One of the main goals of Tellico‘s development has been to be a simple application. I wanted to be able to keep track of my books without having to configure an SQL database, or create a schema, or worry about system daemons. To that end, while I thought about SQLite at the time (several years ago), I ended up writing Tellico to just store all its data in memory. The images are stored on disk, but all the field values for each entry are maintained in simple object containers (vectors and hashes…). The XML format is used only for serializing the data to save and reload.
In practice, I believe that has worked rather well. While I have received emails from folks who try to store 10,000 books in their database and find the performance lacking, by and large, I’ve seen many reviews note favorably that Tellico is simple and flexible to use and can be useful for the majority of people.
I do want to expand Tellico’s capabilities, however. One large goal is to get away from treating each collection as a flat list of entries. I want to be able to have books and movies in the same database, for example, and I want to be able to track TV episodes and seasons equally well. I want to be able to add information about authors and actors.
To that end, I need to rewrite Tellico’s backend. And in considering how I want to do that, I’ve come to a decision point about SQL vs. RDF.
So I’m trying to work up a pro/con list.
I want to say that SQL wins. Embedding or linking against SQLite means a typical user would never need to worry about database permissions, daemon persistence, or username and port settings. At the same time, for power users, the added work to make MySql or PostgreSQL an option, would be reasonable. Akonadi and Digikam have taken this approach, and up until recent versions, so had Amarok.
Using Nepomuk, on the other hand, requires the full Soprano and Virtuoso tool chain. Most KDE desktops are running Virtusoso at this point, I guess, but I don’t want to shut out the GNOME users out there. And on my underpowered development box with 1 GB of RAM, I can’t even use Strigi and Nepomuk.
Here again, I think SQL wins. SQL (and to some extent, SQLite) is used in so many places, I know a significant amount of work has gone into optimizing and improving its efficiency. In other words, if the database access is slow, it’s very likely that the problem is due to my poor programming knowledge rather than a fundamental flaw. I don’t have that reassurance with RDF/SPARQL and Nepomuk. i know Nepomuk is improving, but looking at the bug reports and development fits and starts in the KDE code, it still seems a bit rocky.
I feel like I should include this factor. RDF seems to be a bit of a buzzword with the semantic database push lately.A SQL schema would largely be opaque, while the RDF store, assuming the use of common ontologies, would allow for future interoperability with other databases. This is all rather fuzzy, though, and there’s nothing that says I can’t have some sort of RDF export or translation from the SQL.
If I did use Nepomuk and RDF, I might even have to try to write some sort of abstraction layer to use Tracker on GNOME.
I’d call this a tie! I’ve messed around with some limited SQL and RDF/SPARQL both, and I’m interested in learning more about both.
These are mostly just unordered thoughts bouncing around in my head. I’ll all but decide to take a shot at implementing a SQL backend, and then change my mind an hour later. Plus, who’s to say I can even figure out how to do any of this! I only impersonate a programmer on TV! 🙂
Today, this particular aspect of SPARQL bit me in the test code I was writing:
An application writer should not expect blank node labels in a query to refer to a particular blank node in the data.
I’m sure it won’t be the last time that some aspect of SPARQL bites me in the rear.