Showing posts with label DBA. Show all posts
Showing posts with label DBA. Show all posts

Wednesday, August 7, 2013

Friday, June 7, 2013

AWS EMR : Getting started for Oracle DBAs


Newer technologies such as MapReduce (AWS EMR, Hadoop) and noSQL (MongoDB, AWS DynamoDB...) can be confusing to Oracle DBAs.  This blog post takes a quick look at AWS Elastic Map Reduce (EMR) and attempts to demystify it for Oracle DBAs.  Going back before RDBMs products, MapReduce is like a mainframe batch job with no restart ability built in.  MapReduce facilities the processing of large volumes of data in one large batch.  This one large batch, however, is broken into tens or hundreds of smaller pieces of work and processed by MapReduce worker nodes.  This makes MapReduce a great solution for processing web logs, sensor data, genome data, large volumes of transactions, telephone call detail records, vote ballots, and other instances where large volumes of data need to be processed once and the results stored.MapReduce is a framework so you have to write to an API in your application in order to take advantage of MapReduce.  There are a number of implementations of this framework including Apache Hadoop and AWS Elastic Map Reduce (EMR).  Apache Hadoop has no native data store associates with it (although Hadoop Distributed File System - HDFS can be used natively).As mentioned, you need to code your own application using the MapReduce framework. AWS makes getting started with MapReduce by providing sample applications for EMR.   One of the five sample EMR applications is a Java application for processing for AWS CloudFront logs.   The  is a Java application that uses Cascading to analyze and generate usage reports from Amazon CloudFront http access logs.   You specify the EMR input source (CloudFront log location in S3) in the JAR arguments and you also specify the S3 bucket that will hold the results (output). 


For the CloudFront HTTP LogAnalyzer the input and output files use S3.  However,  HDFS or AWS DynamoDB are commonly used as input sources and sometimes used as output sources.  You may want to use DynamoDB as an output source if you which to load the results into RedShift or do future BI analysis on the results.  You could also send the results to an AWS SQS queue to be handled later for processing to S3, DynamoDB, RDS or some other persistent data store.

Wednesday, May 29, 2013

DBA and developer access to Oracle hosted on AWS


Here are three common methods used to limit access to the AWS environment for DBAs and developers:
  1. Bastion host : A bastion can be used as a jump box' / proxy server. Developers and DBAs would be given access using SSH and than use other credentials to log into the web, application, and database servers. More on bastion host security can be found here: http://cloudconclave.blogspot.com/2013/05/aws-bastion-host-as-single-point-of.html.  There is the cost of the EC2 instance that is the bastion host and data transfer out costs.
  2. VPN with customer gateway and virtual private gateway.  In the case, you create a VPN tunnel.  The costs here are the VPN hardware on your side (customer gateway), cost of virtual private gateway (VPG), and costs of VPN connections and data transfer out of AWS. More on VPN costs here (this assumes this option): http://cloudconclave.blogspot.com/2013/05/vpn-costs-for-connections-and-data.html
  3. OpenVPN : You do not incur the cost of hardware on your side and the VPG on the AWS side.  You still have the cost of the data transfer out.  You would also incur the cost of the EC2 instances that is running an open source VPN software stack (in this case OpenVPN).
I am sure there are other methods as well.

Could also use these constructs to provide secure integration from your on premise or third party applications (SFTP for flat file integration, VPN for web services).

Wednesday, May 1, 2013

AWS getting started with groups and users

A common question is when first setting up and AWS environment is how to prevent developers, OS administrators, DBAs, architectures and all the different roles you may have in your organization from having the correct privileges.  You use IAM group and users.  You would create a developer group.  Ignore roles to start with as these are for AWS services to access other services  (example: EC2 accessing S3) and cross account access.  Then add policies to the group (use policy generator or select a template). Then add each developer as individual users and adding them to the developer group.

Good resource for all of your questions….You can explicitly manage roles and policy
http://docs.aws.amazon.com/IAM/latest/UserGuide/cross-acct-access-walkthrough-creategroup.html

Nice blog entry here:
https://forums.aws.amazon.com/message.jspa?messageID=197920