Enterprise in the Cloud: ephemeral

Showing posts with label ephemeral. Show all posts

Monday, December 2, 2013

Storage tiering on AWS

Here is the replay of the session I just presented with an AWS partner (App Associates) and customer (Riso):http://cloudconclave.blogspot.com/2013/11/aws-storage-tiering-for-enterprise.html

I opened the session by 'testing' the audiences understanding of AWS storage tiers:
1. Assuming a 16K block size, what storage option provides average throughput of 1 to 2 MBPS ?
2. What storage option has single threaded through put of around 17 MBPS ?
3. Assuming a 16K block size, what EBS volume provides an average of 16-20 MBPS through put ?
4. Once again assuming 16K block size and also assuming 4 1K PIOPS volumes, for what EC2 instances will you start to see network saturation ?
5. What storage option produces approximately 100-145 MBPS read and write through put?
6. For which storage option is it possible to transfer approximately 3 TB a day a day over a WAN ?

Answers:
1. Standard IOPS
2. S3
3. 1K PIOPS
4. Any instance with .5 Gpbs network connection. For example, m2.2xlarge
5. Hi1.4xlarge (high IO) ephemeral storage
6. AWS Storage gateway

Data stores compatible with Amazon EMR

There are a number of different file systems that can be used

1. Hadoop Distributed File System (HDFS) : EC2 local/ephemeral disk is where HDFS resides. The obvious disadvantage is that it’s ephemeral storage which is reclaimed when the cluster ends. It can be used for caching the results produced by intermediate job-flow steps during a large EMR job.
2. Local (ephemeral) EC2 disk : Each EMR node comes with local disk. This disk works well for temporary storage of data that is continually changing, such as buffers, caches, scratch data, and other temporary content.
3. S3 native : Used for input (data set to be reduced) and output/results.
4. S3 block : Stay away from as not as performant as the other options.
5. HBase : HBase is an open source, non-relational, distributed database that runs on top of HDFS. HBase works with Hadoop/EMR, sharing its file system and serving as a direct input and output to EMR jobs. HBase also integrates with Apache Hive, enabling SQL-like queries over HBase tables, joins with Hive-based tables, and support for Java Database Connectivity (JDBC).

More information here:
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-file-systems.html

Friday, June 7, 2013

Oracle Database on ephemeral drives

Using EC2 ephemeral storage (either disk or SSD) is a way to achieve higher IO throughput.
You could use the design pattern Redshift uses (these use the HS1.* instances which have similar storage characteristics to the hi1.4xlarge instances) - "the first line of defense consists of two replicated copies of your data, spread out over up to 24 drives on different nodes within your data warehouse cluster". This includes:

1. All data written to a node in your cluster is automatically replicated to other nodes within the cluster
2. All data is continuously backed up to Amazon S3

Oracle on SSD as it is recommendation to get highest level of IO when running Oracle on EC2.

Wednesday, July 11, 2012

AWS EC2 Oracle Database - Storing and managing my data

When create an Oracle Database on the Amazon cloud you will need to store you database files somewhere on the EC2 cloud. There are basically three places where database files can be stored:
1. Local drive - This is the local drive that is part of the virtual server EC2 instance.
2. Elastic Block Storage (EBS) - Network attached storage that appears as a local drive.
3. Simple Storage Server (S3) - 'Storage for the Internet'.

S3 is not high speed and intended for store static document type files. S3 can also be used for storing static web page files. Local drives are ephemeral so not appropriate to be used as a database storage device. The leaves EBS which is the best place to store database files. EBS volumes appear as local disk drives. They are actually network-attached to an Amazon EC2 instance. In addition, EBS persists independently from the running life of a single Amazon EC2 instance. If you use an EBS backed instance for your database data, it will remain available after reboot but not after terminate. In many cases you would not need to terminate your instance but only stop it, which is equivalent of shutdown. In order to save your database data before you terminate an instance, you can snapshot the EBS to S3.

Using EBS as a data store you can move your Oracle data files from one instance to another. This allows you to move your database from one region or or zone to another. Unfortunately, to scale out your Oracle RDS on AWS you can not have read only replicas. This is only possible with the other Oracle relational database - MySQL. The free micro instances use EBS as its storage.

This is a very good white paper that has more details:
AWS Storage Options
This white paper also discusses: SQS, SimpleDB, and Amazon RDS in the context of storage devices. However, these are not storage devices you would use to store an Oracle database. This slide deck discusses a lot of information that is in the white paper:
AWS Storage Options slideshow