BigData Workflow Engine for Hadoop, Hbase, Netezza, Pig, Hive, Cascalog ...
A Glue Workflow is a combination of processes, and each process in turn performs a series of tasks, which in turn are madeup of statements.
Three ways to think of it:
Divide a workflow into highlevel steps (these are the processes), each step can be completed in a series of steps (tasks).
load from database select data write data out as csv send data to HDFS create directory put file into directory run query create query add parameters to query based on data available run query save output download query output insert output into mysql
The example above can be written as:
File: myworkflow.groovy
tasks{
loadFromDatabase{
tasks = { context ->
//select data
//write data out as csv
}
}
sendToHDFS{
dependencies="loadFromDatabase"
tasks = { context ->
//create directory
//put file into directory
}
}
runQuery{
dependencies="sendToHDFS"
tasks = { context ->
//create query
//add parameters to query based on data available
//run query
}
}
saveOutput{
dependencies="runQuery"
tasks = { context ->
//download query output
//insert output into mysql
}
}
}