Showing posts with label key. Show all posts
Showing posts with label key. Show all posts

Monday, January 6, 2014

Redshift : optimizing query performance with compression, distribution key and sort key

Encoding/compression

In order to determine the correct compression, first issue these commands to clean up dead space and analyze the data in the table:
vacuum orders;
analyze orders;
Then issue this command:
analyze compression orders;

Then create a table that matches the results from the analyze compression statement:
CREATE TABLE orders (
  o_orderkey int8 NOT NULL ENCODE MOSTLY32 PRIMARY KEY       ,
  o_custkey int8 NOT NULL ENCODE MOSTLY32 DISTKEY REFERENCES customer_v3(c_custkey),
  o_orderstatus char(1) NOT NULL ENCODE RUNLENGTH            ,
  o_totalprice numeric(12,2) NOT NULL ENCODE MOSTLY32        ,
  o_orderdate date NOT NULL ENCODE BYTEDICT SORTKEY          ,
  o_orderpriority char(15) NOT NULL ENCODE BYTEDICT          ,
  o_clerk char(15) NOT NULL ENCODE RAW                       ,
  o_shippriority int4 NOT NULL ENCODE RUNLENGTH              ,
  o_comment varchar(79) NOT NULL ENCODE TEXT255
);

Distributing data

Partition data using a distribution key. This allows data to be spread out on a cluster to maximize the parallelization potential of the queries. To help queries run fast, the distribution key should be a value that will be used in regularly joined tables. This allows Redshift to co-locate the data of these different entities, reducing IO and network exchanges.
Redshift also uses a specific sort column to know in advance what values of a column are in a given block, and to skip reading that entire block if the values it contains don’t fall into the range of a query. Using columns that are used in filters (i.e. where clauses) helps execution.

Compression depends directly on the data as it is stored on disk, and storage is modified by distribution and sort options.  Therefore, if you change sort or distribution key or create a new table that has the same data but different distribution and sort keys you will need to rerun the vacuum, analyze ad analyze compression statements.

Wednesday, December 4, 2013

EC2 instance public key invalid fix

I happened to do the same thing that is listed at this web page:  http://dltj.org/article/ec2-fix-authorized_keys/

The steps are on the web page. Here are the command:
1. aws ec2 create-volume --snapshot-id snap-YourVolumesSnapshot --size 10 -- availability-zone az-asdf-1a
Note: Assumes you have a current snapshot: aws ec2 create-snapshot --volume-id vol-beforevolumewasbad
 2. aws ec2 stop-instances --instance-id i-YourInstanceId
3. aws ec2 detach-volume --volume-id vol-badvolume
4. aws ec2 attach-volume --volume-id vol-goodvolume --instance-id i- YourInstanceId --device /dev/sda1
5. aws ec2 start-instances --instance-id i-YourInstanceId



Tuesday, December 3, 2013

Copying key pairs to your bastion host

When using a bastion host to protect your Oracle database from anyone on the internet getting into your database instance, you will need to copy the 1024-bit SSH-2 RSA key to your bastion EC2 instance.  This is can be done using Linux secure copy (SCP):
scp -i /Users/tom/EC2KeyPairs/AWSThreeDayIAM.pem DBSysOPS.pem ec2-user@54.22.37.178:~/.

Where:
1. -i /Users/tom/EC2KeyPairs/AWSThreeDayIAM.pem : Is the key to the bastion host.
2. DBSysOPS.pem is the key file for your Oracle database server.
3. ec2-user@54.22.37.178:~/. is your EC2 instance and file location where the key file will be copied.

Tuesday, July 23, 2013

Redshift Query performance

The items that impact performance of the queries against Redshift are:
1. Node type : This is one of two options which are one of the Redshift option types: extra large node (XL)  or an eight extra large node (8XL).
2. Number of nodes : The number of nodes you choose depends on the size of your dataset and your desired query performance. Amazon Redshift distributes and executes queries in parallel across all nodes, you can increase query performance by adding nodes to your data cluster.  You can monitor query performance in the Amazon Redshift Console and with Amazon Cloud Watch metrics.
3. Sort Key : Keep in mind that not all queries can be optimized by sort key.   There is only one sort key for each table.   The Redshift query optimizer uses sort order when it determines optimal query plans.  If you do frequent range or equality filtering on one column, make this column the sort key.  If you frequently join a table, specify the join column as both the sort key and the distribution key.  More details here : http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-sort-key.html
4. Distribution key :  There is one distribution key per table.  If the table has a foreign key or another column that is frequently used as a join key, consider making that column the distribution key. In making this choice, take pairs of joined tables into account. You might get better results when you specify the joining columns as the distribution keys and the sort keys on both tables. This enables the query optimizer to select a faster merge join instead of a hash join when executing the query.  If not many joins, then use the column in the group by clause.  More on distribution keys here: http://docs.aws.amazon.com/redshift/latest/dg/c_choosing_dist_sort.html
Keep in mind you want to have even distribution across nodes. You can issue the select from svv_diskuage to find out the distribution.
5. Column compression : This has an impact on query performance : http://docs.aws.amazon.com/redshift/latest/dg/t_Compressing_data_on_disk.html
6. Run queries in memory : Redshift supports the ability to run queries entirely from memory. This can obviously impact performance.  More details here: http://docs.aws.amazon.com/redshift/latest/dg/c_troubleshooting_query_performance.html
7. Look at the query plan : More details can be found here : http://docs.aws.amazon.com/redshift/latest/dg/c-query-planning.html
8. Look at disk space usage : More details can be found here : http://docs.aws.amazon.com/redshift/latest/dg/c_managing_disk_space.html
9.  Workload manager setting : By default, a cluster is configured with one queue that can run five queries concurrently.  The workload management (WLM) details found here: http://docs.aws.amazon.com/redshift/latest/dg/cm-c-implementing-workload-management.html

Documentation: http://docs.aws.amazon.com/redshift/latest/dg/c_redshift_system_overview.html
Video: http://www.youtube.com/watch?v=6hk0KvjrvfoBlog: http://aws.typepad.com/aws/2012/11/amazon-redshift-the-new-aws-data-warehouse.html

Friday, July 12, 2013

AWS Security 101 for Oracle DBAs, Developers and Architects

Oracle DBAs understand securing data in transit and at rest, but they don't have to deal with file level encryption, security of the databases, firewalls, denial of service attacks, SQL injection attacks, and other OS level security.

General infrastructure security concepts:
1. Some networking concepts such as VPC, VPN, and IPSec also apply to the security realm.  More on these concepts can be found here: http://cloudconclave.blogspot.com/2013/07/aws-network-101-for-oracle-dbas.html
1. SSL : The Secure Sockets Layer (SSL) is a commonly-used protocol for managing the security of a message transmission on the Internet. SSL uses the public-and-private key encryption system from RSA, which also includes the use of a digital certificate.  
2. ACLs : Access Control Lists (ACLs) specifies which users or system processes are granted access to objects, as well as what operations are allowed on given objects.
3. MFA : Multifactor authentication (MFA) is a security system in which more than one form of authentication is implemented to verify the legitimacy of a transaction. The goal of MFA is to create a layered defense and make it more difficult for an unauthorized person to access a computer system or network.  An MFA device can be a Gemalto token (http://onlinenoram.gemalto.com/) or even an iPhone.  http://cloudconclave.blogspot.com/2013/06/mfa-made-easy.html
4. Bastion Host :A bastion host is a special purpose computer on a network specifically designed and configured to withstand attacks. Information on bastion hosts on AWS with Oracle on these two posts: http://cloudconclave.blogspot.com/2013/05/aws-bastion-host-as-single-point-of.html http://cloudconclave.blogspot.com/2013/05/dba-and-developer-access-to-oracle.html
5. iptables : iptables are the tables provided by the Linux kernel firewall.  These firewall rules make it possible for administrators to control what hosts can connect to the system, and limit risk exposure by limiting the hosts that can connect to a system.  Information on iptables for security on AWS here: http://cloudconclave.blogspot.com/2013/06/aws-security-with-iptables.html
6. IDS : An intrusion detection system (IDS) is a device or software application that monitors network or system activities for malicious activities or policy violations and produces reports to a management station. Some systems may attempt to stop an intrusion attempt but this is neither required nor expected of a monitoring system.
7. IPS : Intrusion prevention systems (IPS), also known as intrusion detection and prevention systems (IDPS), are network security appliances that monitor network and/or system activities for malicious activity. The main functions of intrusion prevention systems are to identify malicious activity, log information about this activity, attempt to block/stop it, and report it.   Intrusion prevention systems are considered extensions of intrusion detection systems because they both monitor network traffic and/or system activities for malicious activity.

8. DoS :  A denial-of-service attack (DoS attack) or distributed denial-of-service attack (DDoS attack) is an attempt to make a machine or network resource unavailable to its intended users.  IPS, iptables, AWS security groups, NACLs, and bastion hosts are all ways to prevent DoS attacks.
9. Penetration testing : A penetration test, occasionally pentest, is a method of evaluating computer and network security by simulating an attack on a computer system or network from external and internal threats.

AWS specifics. You must be familiar with all of these concepts in order to perform basic actions on AWS and EC2:
1. Access key and secret key : The access key is used to access AWS using the CLI and TEST API.  The REST and Query APIs use your access keys as the credential.You might be using a third-party product such as S3Fox or ElasticWolf that requires your access keys (because the product itself makes AWS requests for you). Although access keys are primarily used for REST or Query APIs, Amazon S3 and Amazon Mechanical Turk also use access keys with their SOAP APIs. Your Access Key ID identifies you as the party responsible for service requests. You include it in each request, so it's not a secret.The secret key provide anyone that possesses them incredible power to perform delete, terminate, start etc actions on your AWS resources (EC2, ELB, S3 etc) so be very careful with them. Don't e-mail it to anyone, include it any AWS requests, or post it on the AWS Discussion Forums. No authorized person from AWS will ever ask for your Secret Access Key.
2. x509 : X.509 certificates are based on the idea of public key cryptography. It is used for \making requests to AWS product SOAP APIs (except for Amazon S3 and Amazon Mechanical Turk, which use access keys for their SOAP APIs).  SOAP services are being defocused so x509 will not be used as much moving forward.
3. Key pair file (SSH pem file) : You use an Amazon EC2 key pair (aka: PEM file) each time you launch an EC2 Linux/UNIX or Windows instance. The key pair ensures that only you have access to the instance.Each EC2 key pair includes a key pair name, a private key, and a public key.  PEM is a file format that may consist of a certificate (aka. public key), a private key or indeed both concatenated together. Don't pay so much attention to the file extension; it means Privacy Enhanced Mail, a use it didn't see much use for but the file format stuck around. more on using PEM with EC2 here http://cloudconclave.blogspot.com/2012/09/connecting-to-aws-ec2-using-ssh-and-sftp.html
4. Security Groups : A security group acts as a firewall that controls the traffic allowed to reach one or more instances. When you launch an instance, you assign it one or more security groups. You add rules to each security group that control traffic for the instance. You can modify the rules for a security group at any time; the new rules are automatically applied to all instances to which the security group is assigned.

These AWS security concepts are not necessary but one you get beyond the 'playing around phase' of working with AWS these security components are key to working with AWS:
1. ARNs : Amazon Resource Names (ARNs) uniquely identify AWS resources. We require an ARN when you need to specify a resource unambiguously across all of AWS, such as in IAM policies, Amazon Relational Database Service (Amazon RDS) tags, and API calls.  Here is an example ARN: 
<!-- Amazon RDS tag -->
arn:aws:rds:eu-west-1:001234567890:db:mysql-db
ARNs are used extensively with IAM to place security/access policies on AWS services.
2. IAM : AWS Identity and Access Management (IAM) enables you to securely control access to AWS services and resources for your users. Using IAM you can create and manage AWS users and groups and use permissions to allow and deny their permissions to AWS resources. More details here: http://cloudconclave.blogspot.com/2012/10/aws-iam-service.htmlhttp://cloudconclave.blogspot.com/2013/05/aws-getting-started-with-groups-and.html
3. NACLs : Network ACLs operate at the subnet level and evaluate traffic entering and exiting a subnet. Network ACLs can be used to set both Allow and Deny rules. Network ACLs do not filter traffic between instances in the same subnet. In addition, network ACLs perform stateless filtering while security groups perform stateful filtering.
 4. S3 SSE :  http://cloudconclave.blogspot.com/2013/07/s3-sse-without-request-header.html Server-side encryption is about data encryption at rest, that is, Amazon S3 encrypts your data as it writes it to disks in its data centers and decrypts it for you when you access it. As long as you authenticate your request and you have access permissions, there is no difference in the way you access encrypted or unencrypted objects. Amazon S3 manages encryption and decryption for you. For example, if you share your objects using a pre-signed URL, the pre-signed URL works the same way for both encrypted and unencrypted objects.

5. Data Encryption : AWS does not provide encryption of EBS (Elastic Block Storage) . More details on a couple of vendors that provide solutions here: http://cloudconclave.blogspot.com/2013/04/ebs-volume-encryption.html


Monday, May 13, 2013

AWS Bastion host as single point of attack?

Bastion host could open you up to a single point of attack?  It can, but here are two ways to secure your bastion host:
Having only one point of attack is way better then opening up more than one or all of your EC2 instances for port 22 (SSH..assuming Linux) to 0.0.0.0/0 CIDR block.

Monday, March 4, 2013

Mindterm SSH on Mac

I moved from a Windows laptop to a Mac and just experienced this issue:
https://forums.aws.amazon.com/thread.jspa?threadID=117962

I did as suggested and used the following for my private key path:
../Downloads/EssentialsLab.pem

Tuesday, September 18, 2012

Connecting to AWS EC2 using SSH and SFTP


As I mentioned in a previous blog, you can only use SFTP (can not use FTP) to connect to an EC2 instance.  The other thing you need to have correct is an entry in the SSH config file for your  private key (of the key pair value for the EC2 instance).   In the Unix/Mac world, this file is called config and it is located in the ~/.ssh directory.  The following entry needs to be added to this config file:

identityFile ~/tomlaszKeys/PemFileName.pem

Adding this entry will also allow you to connect via SSH without having to pass in the private key file location:
Instead of:
 -i ~/PemfileName.pem

Monday, September 17, 2012

AWS EC2 public key from private key

In some cases, you may only have a private key (pem file) and you need to create a new instance from an AMI which requires a public key.  In this case, you can use the following SSH command:

ssh-keygen -y -f /path/to/private-key > /path/to/public-key


||

You can use PuttyGen to create the public key from the private key