Showing posts with label local. Show all posts
Showing posts with label local. Show all posts

Monday, December 2, 2013

Data stores compatible with Amazon EMR

There are a number of different file systems that can be used

1. Hadoop Distributed File System (HDFS) : EC2 local/ephemeral disk is where HDFS  resides.  The obvious disadvantage is that it’s ephemeral storage which is reclaimed when the cluster ends. It can be used for caching the results produced by intermediate job-flow steps during a large EMR job.
2. Local (ephemeral) EC2 disk :  Each EMR node comes with local disk.  This disk works well for temporary storage of data that is continually changing, such as buffers, caches, scratch data, and other temporary content.
3. S3 native : Used for input (data set to be reduced) and output/results.
4. S3 block : Stay away from as not as performant as the other options.
5. HBase : HBase is an open source, non-relational, distributed database that runs on top of HDFS.  HBase works with Hadoop/EMR, sharing its file system and serving as a direct input and output to EMR jobs. HBase also integrates with Apache Hive, enabling SQL-like queries over HBase tables, joins with Hive-based tables, and support for Java Database Connectivity (JDBC).

More information here:
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-file-systems.html



Wednesday, May 15, 2013

EC2 (classic) instance hostname is not hostname and is not hostname


This is the results from wget and hostname at command line.
1. wget -q -O - http://169.254.169.254/latest/meta-data/public-hostname
Output ---> ec2-50-13-20-1.compute-1.amazonaws.com
2. wget -q -O - http://169.254.169.254/latest/meta-data/hostname
Output-->   ip-10-222-1-241.ec2.internal
3. wget -q -O - http://169.254.169.254/latest/meta-data/local-hostname
output--->ip-10-222-1-241.ec2.internal
4. wget -q -O - http://169.254.169.254/latest/meta-data/local-ipv4
output--->10.222.1.241
5. wget -q -O - http://169.254.169.254/latest/meta-data/public-ipv4
50.13.20.1
6. [root@wls1 ~]# hostname  ---> linux command line
wls1.labs.oracleweblogic.com

Someone changed the host name after boot or at boot time using user-data or another method.  Normally you would see:


1. wget -q -O - http://169.254.169.254/latest/meta-data/hostname
Output ---> ip-14-222-22-28.us-west-2.compute.internal

2. hostname: hostname ---> linux command line
ip-14-222-22-28


Thursday, May 9, 2013

AWS EC2 IP address of running instance

Can use a number of methods, but these are the two that I use:
1: /sbin/ifconfig
2: wget -q -O - http://169.254.169.254/latest/meta-data/local-ipv4

The wget option assumes wget is installed. If it is not, you can issue this command to install it (as root):
 yum -y install wget

Wednesday, July 11, 2012

AWS EC2 Oracle Database - Storing and managing my data

When create an Oracle Database on the Amazon cloud you will need to store you database files somewhere on the EC2 cloud. There are basically three places where database files can be stored:
1. Local drive - This is the local drive that is part of the virtual server EC2 instance.
2. Elastic Block Storage (EBS) - Network attached storage that appears as a local drive.
3. Simple Storage Server (S3) - 'Storage for the Internet'.

S3 is not high speed and intended for store static document type files. S3 can also be used for storing static web page files. Local drives are ephemeral so not appropriate to be used as a database storage device. The leaves EBS which is the best place to store database files. EBS volumes appear as local disk drives. They are actually network-attached to an Amazon EC2 instance. In addition, EBS persists independently from the running life of a single Amazon EC2 instance. If you use an EBS backed instance for your database data, it will remain available after reboot but not after terminate. In many cases you would not need to terminate your instance but only stop it, which is equivalent of shutdown. In order to save your database data before you terminate an instance, you can snapshot the EBS to S3.

Using EBS as a data store you can move your Oracle data files from one instance to another. This allows you to move your database from one region or or zone to another. Unfortunately, to scale out your Oracle RDS on AWS you can not have read only replicas. This is only possible with the other Oracle relational database - MySQL. The free micro instances use EBS as its storage.

This is a very good white paper that has more details:
AWS Storage Options
This white paper also discusses: SQS, SimpleDB, and Amazon RDS in the context of storage devices. However, these are not storage devices you would use to store an Oracle database. This slide deck discusses a lot of information that is in the white paper:
AWS Storage Options slideshow