Documentation Overview

Data Subscription

One big headache in HDFS BigData is to run script when data becomes available and not just on timed frequency, i.e. when data arrives we want our workflow(s) to start.

Glue via GlueCron gives the ability to register one or more workflows to one or more HDFS directories.

Groovy

Groovy is supported as a DSL.

tasks{

  myprocess1 {
    tasks = { ctx ->
       ctx.sql.eachSqlResult('glue', 'select unit_id from units', { rs -> println rs })
    }
  }

}

Clojure

Clojure scripts can be written using the Groovy and Java libraries provided by Glue.

e.g.

(.exec (.ctx cascalog) (def input (hfs-textline "/data/a.log")) (?<- (stdout) [?line] (input ?line)) )

Jython

Jython scripts can be written using the Groovy and Java libraries provided by Glue.

e.g.

def f2(res):
   print(str(res))
   ctx.sql().eachSqlResult('glue', 'select unit_id from units', f2)

JRuby

JRuby scripts can be written using the Groovy and Java libraries provided by Glue.

e.g.

$ctx.sql().eachSqlResult('glue', 'select unit_id from units', Closure.new(

 lambda{ | res |
   puts "Hi #{res}"
 }

))

No XML Workflows

XML is a terrible language for humans to write in, expecially when writing workflows and process oriented scripts.

Documentation Overview

TOC

Data Subscription

Groovy

Clojure

Jython

JRuby

No XML Workflows