Showing posts with label dw. Show all posts
Showing posts with label dw. Show all posts

Friday, May 2, 2014

Redshift SQL query tools

      Here are two tools that can be used with Amazon Redshift to issue query commands:      

1    1. SQL Workbench :
The information required regarding your Redshift cluster are:
A. JDBC or ODBC connection string: example JBDC connection string looks like this: jdbc:postgresql://redshiftexamplecluster.cvhv40lk8tel.us-east-1.redshift.amazonaws.com:5439/redexampletl?tcpKeepAlive=true
B. User ID : master
C. password : password

2. Aginity : http://www.aginity.com                       A. Server (endpoint): example looks like this: lab.cfmvmxhkrtel.us-west-        2.redshift.amazonaws.com
B. User ID : master
C. password : password
D. Database: databasename
E. Port: 5439

The nice thing about Aginity is that there is no JDBC driver to install. With SQL Workbench, you will need to download and install 



More on where to find the  JDBC driver and configure with SQL Workbench
2.     Add the downloaded driver JAR to the driver to the PostGres Driver in SQL Workbench






Monday, December 9, 2013

Redshift example

This a great place to get started with Redshift: http://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html.  There are a couple of pieces of information in the step by step instructions I wanted to elaborate on:
1. Creating a cluster subnet group is done in the Redshift area of the console as seen below. This is not evident in the instructions:


This of course assumes a VPC with two public subnets and a route to an IGW has been created.

2. The copy command is issued from SQL Workbench.copy venue from 's3://awssampledb/tickit/venue_pipe.txt' CREDENTIALS 'aws_access_key_id=<your access key>;aws_secret_access_key=<you secret key>' delimiter '|';

3. Make sure to set auto commit otherwise you need to commit each statements or block of statements and the example does not have commit commands.



Wednesday, July 3, 2013

Redshift - New node and data distribution


What happens when a new node is added to a Redshift cluster?:
A 2-node cluster will distribute data evenly between two nodes based on a hash of the DISTKEY. If a 3rd node is added, the data needs to be rebalanced amongst the 3 nodes. It’s not just a matter of sending all new data to the 3rd node because that would require lookups to figure out where data is stored. Rather, the node needs to be rebalanced by redistributing the data between the 3 nodes. Redshift takes care of this automatically. Just add the nodes and the data moves.
Redshift redistributes the data as follows:
  • A ‘new’ set of nodes is created (in the above example, 3 nodes would be created)
  • Redshift moves the data from the 2-node cluster to the 3-node cluster, rebalancing the data during the copy
  • Users are then flicked across from the ‘old’ 2-node cluster to the ‘new’ 3-node cluster
This is an example of scalable cloud infrastructure — rather than having to ‘upgrade’ an existing system, it is much more efficient to provision a new system, copy data and then decommission the old system. This is a new way of looking at infrastructure that is quite different to the old way of thinking in terms of physical boxes.