BigData Workflow Engine for Hadoop, Hbase, Netezza, Pig, Hive, Cascalog ...
S3 Module allows easy access (internally via the AWS SDK) to the S3 services. This module can be configured to access multiple servers and buckets.
All modules are configured in the /opt/glue/conf/workflow_modules.groovy file
s3{
className='org.glue.modules.S3Module'
//must never be a singleton
isSingleton=false
config{
servers{
defaults3{
//use AWS v4 auth
secretKey="[secretkey]"
accessKey="[accesskey]"
region="[regionname]"
domain="[region domain e.g amazonaws.com]"
bucket="[bucket name to use if none is provided]"
isDefault=true
}
}
}
}
Class: S3Module
The region can either be the region code or name, a look table is used
and if the region specified in the configuration is found the entry in the
lookup map is used, otherwise the region is used as specified.
The lookup map is: (some shortcuts have been added)
def regionMap = ["tokyo": "ap-northeast-1", "singapore": "ap-southeast-1", "sydney": "ap-southeast-2", "frankfurt": "eu-central-1", "ireland": "eu-west-1", "sao paulo": "sa-east-1", "saopaulo": "sa-east-1", "n. virginia": "us-east-1", "virginia": "us-east-1", "n. california": "us-west-1", "california": "us-west-1", "oregon": "us-west-2"]
Method | Description | Example |
---|---|---|
putFile(server:String=null, bucket:String=null, file:String, dest:String):PutObjectResult | Copy the file to the destination key on S3 using the default bucket | ctx.s3.putFile("myfile", "/dir/myfile.txt") |
getFile(server:String=null,bucket:String=null, file:String, localFile:String) | Copy the file from S3 to the local file | ctx.s3.getFile("/dir/myfile.txt", "myfile") |
deleteFile(server:String=null,bucket:String=null,fileString) | Delete the file on S3 | ctx.s3.deleteFile("/dir/myfile.txt") |
putFile(input:InputStream,metadata:ObjectMetaData, dest:String):PugObjectResult | Reads from the java.io.InputStream and writes the content to the dest file | |
createBucket(server:String=null, bucket:String) | Create a bucket on S3 | ctx.s3.createBucket("mynewbucket") |
deleteBucket(server:String=null, bucket:String) | Delete a bucket from S3 | ctx.s3.deleteBucket("mynewbucket") |
listFiles(server:String=null, bucket:String=null, dir:String):List |
Returns a list of files | ctx.s3.listFiles("/mydir") |
streamFile(server:String=null, bucket:String=null, input:InputStream, dest:String, long contentSize) | Copy bytes from input to the dest file in s3, if contentSize is 0 or less its ignored, works for local files but if streaming from HDFS you need to set this content size | ctx.s3.streamFile(new FileInputStream("myfile.txt"), "test.txt", -1) |