Freitag, 18. November 2011

SQL in Clojure

TL;DR

I propose an idiomatic library for reified SQL statements. In Clojure.

Motivation / Disclosure

I have been using ClojureQL and clojure.java.jdbc for some time. I am contributing to ClojureQL, too. Lately, Chris Granger released the excellent Korma library. Naturally this is of interest to me.
It raises the question, in how far CQL and Korma try to do the same thing; where the differences and possible synergies are. At the recent 2nd Clojure Conj, I got the chance to take a look into Korma. I saw the presentation, played a bit at the repl and browsed the source. Based on that, plus my knowledge of CQL, I can offer you

My observations about about Korma

First off: Korma's implementation is elegant and succinct, my compliments to Chris!
Korma, in it's current state, is a generator for SQL SELECT statements. It gives you a DSL to set the possible slots of a single SELECT statement: fields (SELECT), joins (FROM), filters (WHERE), order specifications (ORDER BY) and aggregation columns (GROUP BY).
Korma takes some measures to make SELECT more composable. You can specify fields and joins in multiple steps and also add new predicates to a query.
It also has DML abstractions (INSERT, UPDATE, DELETE), but I will focus on the query part, since that's where I see the value of CQL.

What ClojureQL brings to the table (pun retrotended)

CQL implements Relational Algebra. RA is the canonical, decomplected form of SQL, Approved by Science (tm). It is defined as operators on relations. A relation is a set of named tuples, corresponding to an SQL table or a SELECT expression.
  • Projection (π) - the enumeration of columns in a SELECT FROM
  • Selection (σ) - WHERE
  • Rename (ρ) - SELECT _ AS renamed
  • Joins (⋈,⟕,⟖,⟗,θ) - NATURAL, LEFT, RIGHT, FULL, ON predicate
  • Aggregation (g) - aggregates on columns + rename; also non-aggregate columns in GROUP BY

CQL extends RA by allowing sort, take, drop, union, difference on relations, to add ORDER BY, OFFSET, LIMIT, UNION (ALL), INTERSECT clauses respectively.
The practical advantage is, that CQL lets you compose these operators, as you can in RA, and compiles them into SQL statements. That means you can also do compositions, that require boilerplate in SQL (a.k.a. a subselect), like joining in an aggregated relation.
For the base case it, does the same thing as Korma, letting you fill in the slots of a SELECT piecewise.

Why Korma then?

As I said, I find Korma's implementation compellingly simple. It focuses on one thing: A dataformat + api for SQL SELECT statements. It's implementation is very short and straightforward.
Those traits remind me at ring.
Yes, the http library.
I know, I love it too ;)
Also, this whole thing ties in nicely with Mark McGranaghan's talk at the conj about first-classifying all communication protocols (I hope vids are up soon).
The bottom line is: I think we can take Korma, modularize it ring style and have CQL as a pure RA compiler that targets the same data format. Then have the various DB impls take values of that dataformat.

What's missing?

Well, apart from an actual proposal of a data format (*), there are a few things, Korma's notion of the various SQL statements is quite complete. Although it would need auditing for finer points like:
  • Coverage of standard SQL, e.g. subselects, WITH statements, unions, CASE expressions
  • Ability to represent vendor - specific statements with embedded field / table references
  • Representation of (qualified) table- and field references; Korma does this by :table.field; should that be :table/field and :schema.table/field?

I'll be happy help along on SQL as data.

Next

Fleshing out the missing details and giving it the blessing. Easy, right?
If you have done a lot of SQL and know all the odd corners, please take a look at the data structure and make proposals how it could look like to accomodate all cases.