BigData Workflow Engine for Hadoop, Hbase, Netezza, Pig, Hive, Cascalog ...
Glue is split into three installs:
Follow the same instructions as for RPM install excep after downloading run sudo alien
Follow the configuration steps outlined in the documentation.
At a minimum or for quick testing do:
Grant all permissions to the user from localhost
CREATE USER 'glue'@'localhost' IDENTIFIED BY 'glue';
GRANT ALL ON glue.* TO 'glue'@'localhost';
FLUSH privileges;
Glue installs init.d scripts to /etc/init.d/glue-server
To start glue type: service glue-server start
Check that its running by looking at the output logs in /opt/glue/logs
Follow the configuration steps outlined in the documentation.
The GLUE_UI_CONFIG variable must point to a basic configuration file (explained in the documentation) Convention for this value is to point to /opt/glue/conf/glue-ui.groovy
Set this variable in TOMCAT_HOME/conf/tomcat5.conf or TOMCAT_HOME/conf/tomcat6.conf
e.g.
export GLUE_UI_CONFIG=/opt/glue/conf/glue-ui.groovy
Follow the same instructions as for RPM install excep after downloading run sudo alien
Grant all permissions to the user from localhost
CREATE USER 'glue'@'localhost' IDENTIFIED BY 'glue';
GRANT ALL ON glue.* TO 'glue'@'localhost';
FLUSH privileges;
Run either: /opt/gluecron/bin/dbsetup.sh
or create the tables:
CREATE TABLE `unittriggers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`unit` varchar(100) DEFAULT NULL,
`type` varchar(10) DEFAULT NULL,
`data` varchar(100) DEFAULT NULL,
`lastrun` date DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=11 DEFAULT CHARSET=latin1;
CREATE TABLE `unitfiles` (
`unitid` int(11) DEFAULT NULL,
`fileid` int(11) DEFAULT NULL,
`status` varchar(10) DEFAULT NULL,
UNIQUE KEY `a` (`unitid`,`fileid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `hdfsfiles` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`path` varchar(1000) NOT NULL,
`seen` tinyint(4) DEFAULT '0',
`ts` bigint(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `path` (`path`),
KEY `seen1` (`seen`)
) ENGINE=MyISAM AUTO_INCREMENT=1289221 DEFAULT CHARSET=latin1
|Property | Description | Default |---------|-------------|--------- refresh.freq | frequency at which checks are performed in minutes | 5 hdfsfiles.table | database table for hdfs files | hdfsfiles hdfsfiles-history.table | database table for hdfs files for triggers of type hdfs-history | hdfsfiles unittriggers.table | database table from which the triggers are read | unittriggers unitfiles.table | table in which the status of each unit's execution against the hdfs files is stored | unitfiles unitfiles-history.table | table in which the status of each unit's execution against the hdfs history files is stored | unitfiles
It is important that the correct hadoop jars are in the gluecron classpath. One version of Hadoop is not always compatible with another and for this reason Glue Cron does not package the hadoop libraries.
Ensure that you have the hadoop client installed.
The script /opt/gluecron/conf/env.sh will try to automatically detect the hadoop install and add the jar and configuration dependencies to the classpath. If you have any problems during starting glue please check that the variable HADOOP_LIB points to the correct locations.
To do so follow the instructions below:
Glue installs init.d scripts to /etc/init.d/gluecron
To start gluecron type: service gluecron start
Check that its running by looking at the output logs in /opt/gluecron/logs/gluecron.log