BigData Workflow Engine for Hadoop, Hbase, Netezza, Pig, Hive, Cascalog ...
This covers how to add in Hadoop, Pig and Hive dependencies for different hadoop flavours, which are not compatible between each other e.g. cloudera cdh4 is not compatible with apache hadoop libraries.
To point the glue workflow at the correct libraries open the /opt/glue/conf/exec.grooy file and configure the processClassPath variables.
processClassPath = ['/opt/glue/lib-pig', '/opt/glue/lib-hadoop', '/opt/glue/lib']
processClassPath = ['/opt/glue/lib/', '/usr/lib/pig', '/usr/lib/pig/lib', '/usr/lib/hadoop/lib/', '/usr/lib/hadoop', '/opt/glue/conf', '/usr/lib/hive', '/usr/lib/hive/lib']
Pig runs in a different jvm instance than the workflow itself, this allows us to setup the pig classpath to any pig distribution.
Open the /opt/glue/conf/workflow-modules.groovy file and in the pig module configuration, change the classpath property
classpath = ['/opt/glue/lib-pig', '/opt/glue/lib-hadoop', '/opt/glue/lib']
classpath = ['/opt/glue/lib/', '/usr/lib/pig', '/usr/lib/pig/lib', '/usr/lib/hadoop/lib/', '/usr/lib/hadoop', '/opt/glue/conf', '/usr/lib/hive', '/usr/lib/hive/lib']