Wednesday, January 29, 2014


There are two offerings for running SAP HANA on AWS
1. AWS Marketplace : This runs on the Cluster Compute 8XL (cc2.8xlarge) instance type with 60.5 GB of memory. This includes the SAP HANA One licenses. 
2. SAP Cloud Appliance Library (CAL) : This uses a AWS CloudFormation template to deploy a cr1.8xlarge cluster compute instance (244 GB of memory and 240 GB of SSD instance storage) with Amazon PIOPS volumes.

Monday, January 27, 2014

AWS getting started partner accreditation

Great place to start for business and technical partners new to AWS:

AWS Services : New Services and and updates

Here are the number and an example of significant services and updates since 2008:
1. 2008 : 24 news services and features including Amazon EBS, Amazon CloudFront and EC2 Availability Zones
2. 2009 : 48 news services and features including Amazon RDS, Amazon VPC, Amazon EMR, and EC2 Auto Scaling.
3. 2010 : 61 new services and features including Amazon SNS, Amazon Route 53, Amazon IAM, and AWS Singapore Region.
4. 2011 : 82 new services and features including AWS San Paulo Region, AWS Oregon Region, AWS Tokoyo Region, and Amazon Oracle RDS.
5. 2012 : 159 new services and features including Amazon DynamoDB, Amazon Glacier, AWS Marketplace, and Amazon Storage Gateway.
6. 2013 : 245+ new services and features including AWS China Region, Amazon Redshift, Amazon Workspaces, and Amazon Kinesis.

Autoscaling in AWS Console

Friday, January 10, 2014

Migrating data from on premise to AWS

Here are some the approximate transfer rates for Amazon Storage Gateway and Riverbed Whitewater:

  • Amazon Storage Gateway 5 TB a day
  • Riverbed Whitewater  3-5 TB an hour

Keep in mind that using gateways to migrate a relational database is probably not the best option as these mechanism use snapshots.

Other blog entries are here:

Migrating data for migrating to RDS but useful for EC2 based databases as well:

This blog discusses the time it takes to transfer 1 TB of data over different lines with different transfer speeds:

Mechanisms to bulk migrate data:

Here is a blog post on different transfer mechanisms for applications but could be used for databases:

Attunity CloudBeam:

Cloud migration methodology

Thursday, January 9, 2014

AWS Redshift visualization

Jaspersoft can be used with AWS Redshift.  Here are a couple of demos:

Here is a recording of the flow:

Here is a slightly longer recording (showing dashboard and interactive report):

Have to have a datasource and domain created.  These demos already have them created.

VPC benefits over EC2 classic

Here are some of the reasons why you would use VPC instead of EC2 classic for your Oracle instances:
  • Predictable internal IP ranges: You define the IP address range of your VPC as opposed to your IP address being part of the AWS region IP address range.
  • Subnets : Logically group Amazon EC2 instances into a private or public subnets and assign them private IP addresses.
  • Traffic Routing : Control the outbound/egress traffic from your Amazon EC2 instances (in addition to controlling the ingress traffic to them; EC2 Classic security groups are ingress only) and provide selective internet access to instances.
  • Network ACLs : Additional layer of security to your Amazon EC2 instances in the form of network Access Control Lists (ACLs). These allow for deny rules instead of just allow rules that security groups have.
  • VPN Connectivity : Connect your VPC to corporate data center and on-premise infrastructure with a VPN connection, so that you can use Amazon VPC as an extension of your existing data center network,
  • DHCP options: DHCP option sets let you specify the domain name, DNS servers, NTP servers, etc. that new nodes will use when they’re launched within the VPC. This makes implementing custom DNS much easier. In EC2 you have to spin up a new node, modify DNS configuration, then restart networking services in order to gain the same effect. 
  • Multiple IP's per MAC address: The Elastic Network Interfaces (ENI) is a virtual network interface that can include the following attributes :
    • a primary private IP address
    • one or more secondary private IP addresses
    • one Elastic IP address per private IP address
    • a MAC address
    • one or more security groups
    • a source/destination check flag
    • a description
  • Multiple ENIs per instance: Attach multiple Elastic Network Interfaces (ENI) to each instance for multiple MAC addresses.
  • Moving ENIs (MAC and IP addresses) between instances :  ENI's attributes follow the ENI as it is attached or detached from an instance and reattached to another instance.

Monday, January 6, 2014

Redshift : optimizing query performance with compression, distribution key and sort key


In order to determine the correct compression, first issue these commands to clean up dead space and analyze the data in the table:
vacuum orders;
analyze orders;
Then issue this command:
analyze compression orders;

Then create a table that matches the results from the analyze compression statement:
  o_orderkey int8 NOT NULL ENCODE MOSTLY32 PRIMARY KEY       ,
  o_custkey int8 NOT NULL ENCODE MOSTLY32 DISTKEY REFERENCES customer_v3(c_custkey),
  o_orderstatus char(1) NOT NULL ENCODE RUNLENGTH            ,
  o_totalprice numeric(12,2) NOT NULL ENCODE MOSTLY32        ,
  o_orderdate date NOT NULL ENCODE BYTEDICT SORTKEY          ,
  o_orderpriority char(15) NOT NULL ENCODE BYTEDICT          ,
  o_clerk char(15) NOT NULL ENCODE RAW                       ,
  o_shippriority int4 NOT NULL ENCODE RUNLENGTH              ,
  o_comment varchar(79) NOT NULL ENCODE TEXT255

Distributing data

Partition data using a distribution key. This allows data to be spread out on a cluster to maximize the parallelization potential of the queries. To help queries run fast, the distribution key should be a value that will be used in regularly joined tables. This allows Redshift to co-locate the data of these different entities, reducing IO and network exchanges.
Redshift also uses a specific sort column to know in advance what values of a column are in a given block, and to skip reading that entire block if the values it contains don’t fall into the range of a query. Using columns that are used in filters (i.e. where clauses) helps execution.

Compression depends directly on the data as it is stored on disk, and storage is modified by distribution and sort options.  Therefore, if you change sort or distribution key or create a new table that has the same data but different distribution and sort keys you will need to rerun the vacuum, analyze ad analyze compression statements.

Instance types and network throughput

Network throughout is a very good metrics to know when running an Oracle Database on AWS or for any time you are moving data from different AWS services other AWS services. Maximum throughput for different Amazon EC2 instance types can be found here:

This 2013 AWS reInvent session also has details (Slide 19)

The m1.xlarge instance maximum network through put is 128 MBPS / 1K Mbps. The CC2.8xlarge, CR1.8xlarge, hi1.4xlarge, and cg1.xlarge ( can provide up to 800 MPBS.

Thursday, January 2, 2014

Accenture white paper : Disaster Recovery with Amazon Web Services

Disaster Recovery with Amazon Web Services:
A Technical Guide
The paper covers
1. Define the challenges that enterprises face in adopting public cloud solutions for disaster recovery.
2. Describe the value that large enterprises can gain by adopting cloud-based DR with services such as Amazon Web Services (AWS) for disaster recovery.
3. Provide recommended disaster recovery architecture patterns.