Tuesday, June 4, 2013

Bulk loading data to AWS from on premise

I spoke about bulk loading data to AWS in this blog post:

A couple other options I did not mention are:

1. Apera - Asperasoft Company has developed a proprietary file transfer protocol based on UDP, which has shown to introduce very high-speed file transfer experience over the Internet.

2. http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html   Apache DistCp is an open-source tool you can use to copy large amounts of data. DistCp uses MapReduce to copy in a distributed manner  You can also use S3DistCp to copy data between Amazon S3 buckets or from HDFS to Amazon S3.  Since it is based upon MapReduce, it is most applicable when moving HDFS and other map reduce files as you will obviously have MapReduce installed in your environment. 

