SQL or RDF? Thoughts on Tellico’s Next Backend

One of the main goals of Tellico‘s development has been to be a simple application. I wanted to be able to keep track of my books without having to configure an SQL database, or create a schema, or worry about system daemons. To that end, while I thought about SQLite at the time (several years ago), I ended up writing Tellico to just store all its data in memory. The images are stored on disk, but all the field values for each entry are maintained in simple object containers (vectors and hashes…). The XML format is used only for serializing the data to save and reload.

In practice, I believe that has worked rather well. While I have received emails from folks who try to store 10,000 books in their database and find the performance lacking, by and large, I’ve seen many reviews note favorably that Tellico is simple and flexible to use and can be useful for the majority of people.

I do want to expand Tellico’s capabilities, however. One large goal is to get away from treating each collection as a flat list of entries. I want to be able to have books and movies in the same database, for example, and I want to be able to track TV episodes and seasons equally well. I want to be able to add information about authors and actors.

To that end, I need to rewrite Tellico’s backend. And in considering how I want to do that, I’ve come to a decision point about SQL vs. RDF.

Many highly-visible KDE applications use SQL, such as Amarok, Digikam, and Akonadi. I just read a blog post about using the same MySQL instance for all three of those applications.

On the other hand, the Nepomuk framework in KDE provides an interface to an RDF database. Bangarang and KMail 2 are both heavily using Nepomuk.

So I’m trying to work up a pro/con list.

Portability

 

I want to say that SQL wins. Embedding or linking against SQLite means a typical user would never need to worry about database permissions, daemon persistence, or username and port settings. At the same time, for power users, the added work to make MySql or PostgreSQL an option, would be reasonable. Akonadi and Digikam have taken this approach, and up until recent versions, so had Amarok.

Using Nepomuk, on the other hand, requires the full Soprano and Virtuoso tool chain. Most KDE desktops are running Virtusoso at this point, I guess, but I don’t want to shut out the GNOME users out there. And on my underpowered development box with 1 GB of RAM, I can’t even use Strigi and Nepomuk.

Development Maturity

Here again, I think SQL wins. SQL (and to some extent, SQLite) is used in so many places, I know a significant amount of work has gone into optimizing and improving its efficiency. In other words, if the database access is slow, it’s very likely that the problem is due to my poor programming knowledge rather than a fundamental flaw. I don’t have that reassurance with RDF/SPARQL and Nepomuk. i know Nepomuk is improving, but looking at the bug reports and development fits and starts in the KDE code, it still seems a bit rocky.

SPARQL also has some weird semantics, such as blank nodes, a need for custom Insert/Replace behavior, and a lack of aggregate functions. SPARQL is still rather immature, in that sense.

Interoperability

I feel like I should include this factor. RDF seems to be a bit of a buzzword with the semantic database push lately.A SQL schema would largely be opaque, while the RDF store, assuming the use of common ontologies, would allow for future interoperability with other databases. This is all rather fuzzy, though, and there’s nothing that says I can’t have some sort of RDF export or translation from the SQL.

If I did use Nepomuk and RDF, I might even have to try to write some sort of abstraction layer to use Tracker on GNOME.

Developer Interest

I’d call this a tie! I’ve messed around with some limited SQL and RDF/SPARQL both, and I’m interested in learning more about both.

Conclusion

These are mostly just unordered thoughts bouncing around in my head. I’ll all but decide to take a shot at implementing a SQL backend, and then change my mind an hour later. Plus, who’s to say I can even figure out how to do any of this! I only impersonate a programmer on TV! 🙂

MSL Mission Animation

The Mars Science Laboratory will launch late this year, sometime around the end of November, or early December. It’s the largest rover that JPL has ever sent to Mars, and also the most expensive. Just about everyone I know at JPL has worked on some aspect of MSL at one point or another, myself included.

The sequence of events for Entry, Descent and Landing is incredibly complex, as you can see in the latest animation video that JPL put out.

Photo Collection From Endeavour STS-134

The Atlantic’s In Focus page has a photo collection from Endeavour’s last flight, STS-134. The photos are amazing. I particularly love the ones of the Shuttle in space, with the long-exposure of the earth beneath.

I flew down to Florida to try to watch the launch of STS-134 in May. Alas, the initial launch date was scrubbed due to heater electrical problems. My wife and I spent all of 2 days in Florida, mostly driving and sitting out in Titusville. Love the adventure, hated to miss the launch.

Winding Down the Space Shuttle Program

Carolyn Collins Petersen has a nice blog post about some of her memories of the Space Shuttle Program.

As NASA winds down its space shuttle missions — Endeavour launches on April 29 and Atlantis is scheduled for late June — it’s kind of hard to think that after those flights, there will be no direct access to space via NASA.

I’m hoping to go down to the Cape to see Endeavour‘s launch on Friday. So far, all systems are green for my trip and for the launch!

RADM Guadagnini on Carrier Night Operations

My officemates and I have been following a series of videos posted on Youtube by the commander of the USS Abraham Lincoln Carrier Strike Group, Rear Admiral Guadagnini. I’d urge you to go back and watch all of them for the insight into life on an aircraft carrier, plus the enjoyment of seeing “Admiral Guad’s” personality come through.

This latest one showing video of a catapult launch and aircraft landing during night operations is really nice.

Further Adventures in Asset Allocation

After I had decided on the basics of our asset allocation, there were stilll some further choices to be made. First, within the 60% we were allocating to equities, those stocks can be divided into domestic and international funds. Diversifying outside the bounds of the U.S. shields our portfolio from the risk of having all the eggs in one basket. If the U.S/international split was based purely on market capitalization, it would be close to 50/50. But with my American exceptionalism hat on, I decided to weight the U.S. at 60% of the total stock allocation, leaving 40% for international.

Further, since past data (which provides no guarantee of future behavior!) shows that stocks of small companies, particularly those that are undervalued vs. their inherent value, perform a bit better than the overall market. So out of the U.S. and international pie slices, a small bit of each is reserved for indices which track small capitalization, under-valued companies.

Finally, within the bond portion of the asset pie, I split half and half between a fund tracking the full bond market, and one that includes inflation protection (TIPS). TIPS provide a lower rate of return than normal bonds, but since the return is guaranteed to be on top of inflation, they provide additional diversification and protection.

Now, all of that adds up to quite a few pie slices. I didn’t quite round the numbers exactly, but the following chart shows the general idea.

allocation1.png

Next, the difficult portion was to find funds available in the various retirement accounts that my wife and I had accumulated. They all had different lists of available funds, so it took quite a while to work everything out. I use a modified version of an Excel spreadsheet from the Bogleheads site to check how everything adds up. When there were multiple options for a given asset type, I chose the fund with the lowest operating expense, which usually ended up being either Vanguard or Fidelity Spartan funds.

All told, we ended up with 18 different funds. Crazy, I know! That’s what we get for having nine different accounts with three different firms! At the moment, 12 of the 18 constitute less than about 5% of our total each.allocation2.png

So there it is. Most of the information I’ve read recommends rebalancing no more than maybe twice a year. Rebalancing is the process of comparing your current asset allocation against your target and then shifting funds to bring yourself back into alignment. You don’t want to do it too often, or you lose out on investment gains. But do it too sparingly and you don’t get as much risk protections from the diversification.

Too many numbers. I know myself to be a spreadsheet junkie, so it was somewhat enjoyable. But now I get the benefit of knowing that our retirement funds are balanced against risk and that I don’t have to bother to check the market very often. Our target asset allocation is set, and we’ll ride this for at least five years or so, when it may be time to adjust the percentages due to age.