Glue

BigData Workflow Engine for Hadoop, Hbase, Netezza, Pig, Hive, Cascalog ...

AWS S3 API

TOC

S3 Module allows easy access (internally via the AWS SDK) to the S3 services. This module can be configured to access multiple servers and buckets.

Configuration file

All modules are configured in the /opt/glue/conf/workflow_modules.groovy file

Configuration Example

    s3{
            className='org.glue.modules.S3Module'
            //must never be a singleton
            isSingleton=false
            config{
              servers{
                 defaults3{
                //use AWS v4 auth
                            secretKey="[secretkey]"
                            accessKey="[accesskey]"
            region="[regionname]"
            domain="[region domain e.g amazonaws.com]"
            bucket="[bucket name to use if none is provided]"
                            isDefault=true
                 }
             }
           }
    }

Class: S3Module

S3 Regions

The region can either be the region code or name, a look table is used
and if the region specified in the configuration is found the entry in the
lookup map is used, otherwise the region is used as specified.

The lookup map is: (some shortcuts have been added)

def regionMap = ["tokyo": "ap-northeast-1", "singapore": "ap-southeast-1", "sydney": "ap-southeast-2", "frankfurt": "eu-central-1", "ireland": "eu-west-1", "sao paulo": "sa-east-1", "saopaulo": "sa-east-1", "n. virginia": "us-east-1", "virginia": "us-east-1", "n. california": "us-west-1", "california": "us-west-1", "oregon": "us-west-2"]

API

Method Description Example
putFile(server:String=null, bucket:String=null, file:String, dest:String):PutObjectResult Copy the file to the destination key on S3 using the default bucket ctx.s3.putFile("myfile", "/dir/myfile.txt")
getFile(server:String=null,bucket:String=null, file:String, localFile:String) Copy the file from S3 to the local file ctx.s3.getFile("/dir/myfile.txt", "myfile")
deleteFile(server:String=null,bucket:String=null,fileString) Delete the file on S3 ctx.s3.deleteFile("/dir/myfile.txt")
putFile(input:InputStream,metadata:ObjectMetaData, dest:String):PugObjectResult Reads from the java.io.InputStream and writes the content to the dest file
createBucket(server:String=null, bucket:String) Create a bucket on S3 ctx.s3.createBucket("mynewbucket")
deleteBucket(server:String=null, bucket:String) Delete a bucket from S3 ctx.s3.deleteBucket("mynewbucket")
listFiles(server:String=null, bucket:String=null, dir:String):List Returns a list of files ctx.s3.listFiles("/mydir")
streamFile(server:String=null, bucket:String=null, input:InputStream, dest:String, long contentSize) Copy bytes from input to the dest file in s3, if contentSize is 0 or less its ignored, works for local files but if streaming from HDFS you need to set this content size ctx.s3.streamFile(new FileInputStream("myfile.txt"), "test.txt", -1)