Necrobious'

Thursday, March 18, 2010

Cassandra, Thrift, and Hackage

Last week, I posted a quick 'hello world' example of inserting into and selecting data out of a Cassandra database. To get this example working, I first needed to download and compile the Thrift compiler, which also ships with a Haskell lib, which was already packaged in the Cabal format, but not uploaded to the Hackage repository, so I posted their Thrift to Hackage.

The second dependency my example has, was the Cassandra interface lib, which is compiled from the interface/cassandra.thrift file included in the cassandra 0.5.1 release.Since this code needs to only be generated once, I packaged them as the cassandra-thrift package, and posted them to Hackage.

Hopefully this helps other Haskell developers get Cassandra code up and running quickly, as we'll no longer need to mess around with thrift to talk to a running Cassandra server.

Next up: We need a better (high level) Cassandra Haskell API built on top of this plumbing... More on that soon.

Cheers,
-kirk

Labels: , , ,

Saturday, March 13, 2010

Haskell & Cassandra: First Steps

Ok, so I had a few hours to finally sit down and try getting a 'Hello World' example put together for interacting with a Cassandra database from Haskell.
I used the following:
Mac OS 10.6.2
GHC 6.12.1
apache-cassandra-0.5.1 (JDK 1.6 from Apple)
thrift-incubating-0.2.0 (Compiled using GCC 4.2.1 and Boost 1.41 from MacPorts)

I uses a stock Cassandra install, and did not customize the included example schema, starting the server with
$ cd apache-cassandra-0.5.1
$ bin/cassandra -f

Getting the thrift compiler built and the Thrift lib installed into GHC proved to be a pain point for me, and will be the topic of a subsequent post if others have a similar experience.

Once the Thrift package is installed in GHC, and you have used the thrift compiler to compile compile the Cassandra interface definition file
$ thrift --gen hs apache-cassandra-0.5.1/interface.cassandra.thrift

You should have the generated Haskell cassandra sources:
-rw-r--r-- 1 kirk kirk 83431 Mar 13 09:57 Cassandra.hs
-rw-r--r-- 1 kirk kirk 16946 Mar 13 09:57 Cassandra_Client.hs
-rw-r--r-- 1 kirk kirk 613 Mar 13 09:57 Cassandra_Consts.hs
-rw-r--r-- 1 kirk kirk 1823 Mar 11 13:02 Cassandra_Iface.hs
-rw-r--r-- 1 kirk kirk 19015 Mar 13 09:56 Cassandra_Types.hs

Unfortunately, there is little API docs to go on here for haskellers, but I was able to finally duplicate the Java example from the Cassandra Wiki using a default Cassandra install and schema:

{-# LANGUAGE DeriveDataTypeable #-}

import Network
import System.IO
import Thrift.Protocol.Binary
import Thrift.Transport.Handle
import Cassandra_Client
import Cassandra_Types

import System.Time

main = do
-- connect to Cassandra running locally
handle <- hOpen ("127.0.0.1", PortNumber 9160)

TOD sec usec <- getClockTime -- were using old-time here, but really we jsut need a good
-- Int64 value for the timestamp

let binpro = BinaryProtocol handle
let proto = (binpro, binpro) -- no idea why the Casandra API splits into (in an out proto, but w/e)

-- put some data into the column
insert
proto -- The protocol/transport value
"Keyspace1" -- the KeySpace (typically this will be your app's name)
"kirk" -- The "row" id, or in cassandra-speak, the key
(ColumnPath -- Column paths point to either a colum or a super column
(Just "Standard1") -- The column family
Nothing -- The super column name
(Just "name")) -- The column name
"Kirk Peterson" -- The value to insert into the column
(fromInteger sec) -- The timestamp for the insert
ONE -- The ConsistencyLevel to enforce on the insert


-- pull the value back out
r <- get_slice
proto -- more protocol/tranport stuff
"Keyspace1" -- the KeySpace (typically this will be your app's name)
"kirk" -- again, the key
(ColumnParent -- similar to column path, the column parent.
(Just "Standard1") -- the Column Family
Nothing) -- the super column (not used in our example)
(SlicePredicate -- predicate/filter for row, either by column names and/or by a slice range
Nothing -- Column Names to filter on (not used in ut example)
(Just (SliceRange -- A SliceRange
(Just "") -- Range Start
(Just "") -- Range End
(Just False) -- Results Reversed?
(Just 10)))) -- Result Count
ONE -- The ConsistencyLevel to enforce on the read


putStrLn $ "found " ++ (show $ length r) ++ " record(s)"
mapM_ (putStrLn . show) r

hClose handle




When compiled (or ran from ghci) you should see something similar to the following:
*Main> main
Loading package parsec-2.1.0.1 ... linking ... done.
Loading package network-2.2.1.5 ... linking ... done.
Loading package Thrift-0.1.0 ... linking ... done.
Loading package array-0.3.0.0 ... linking ... done.
Loading package containers-0.3.0.0 ... linking ... done.
Loading package old-locale-1.0.0.2 ... linking ... done.
Loading package old-time-1.0.0.3 ... linking ... done.
found 1 record(s)
ColumnOrSuperColumn {f_ColumnOrSuperColumn_column = Just (Column {f_Column_name = Just "name", f_Column_value = Just "Kirk Peterson", f_Column_timestamp = Just 1268515834}), f_ColumnOrSuperColumn_super_column = Nothing}

Win! Cassandra wrote a record to the datastore, and was able to read it back.
Hopefully this help you get your cassandra app up and running, thats all I have for now, hit me with your questions/ experiences in the comments, and hopefully I can help..

cheers,
-kirk

Labels: , , ,