All about Cloud, mostly about Amazon Web Services (AWS)
Search with Google

Amazon Web Services provides some great capabilities. Amazon Kinesis Firehose is one example. Firehose makes it really easy to take data from an Amazon Kinesis Streams and copy it to either an Amazon Simple Storage Service (S3) bucket, an Amazon Redshift database, an Amazon Elasticsearch Service cluster, or Splunk. This ability however can expose a company to data exfiltration risk. A rogue employee or a hacker could create a Firehose and send data to a resource under their control, so it’s important to consider preventing Kinesis Firehose data exfiltration.

AWS generates a lot of content, including User Guides, Developer Guides and Blog Posts across multiple categories. It can be really difficult to keep up with all the latest changes.

What makes it more complex is the API Reference Guides include an API version, but that rarely ever changes. For example, the Amazon Elastic Compute Cloud (EC2) API Reference Guide, available here still shows the version number of “2016-11-15”, despite containing documentation for relatively recent features such as VPC Endpoints.

Converting files between various formats is a fact of life for IT staff. This is often because different formats are optimized for specific use cases. For example, JPEG files are great for photographs and use lossy compression that can be almost impossible for the human eye to detect by PNG files are great when logos or graphics need to preserve every pixel in an image.

Apache Avro and Apache Parquet are similar examples. This post explains the benefits of each file format, and demonstrates how to convert between the two formats using Amazon Athena.

When working with “big data” analytics systems, the de facto standard file system format is Apache Parquet. Parquet is a columnar format, meaning that unlike more transactional processing systems which arrange their data in row order, the data is arranged in columns instead. This has the benefit of reducing the amount of data which must be read if only a few columns are required, but the disadvantage that inserts, updates, and record (row) based processing may be slower. An alternative to Apache Parquet is the Apache Avro file format. It is row-based, supports an embedded schema and can be processed by Amazon Elastic MapReduce (EMR), Amazon Athena, Amazon Redshift and AWS Glue.

I often use the AWS Management Console and in accordance with best practices I have enabled Multi-Factor Authentication (MFA) in my accounts. Starting the AWS Console first involves bringing up a browser, going to the URL, typing in my username and password. I then dig out my phone, stare at it while FaceId scans me, open the Duo app, and select the account to display the code. I then switch back to the computer and type the code into the browser. Phew! Wouldn’t it be nice if you could get a quick AWS console login using the access key and secret access key and bypassing MFA? This post shows how that can be achieved!

Amazon DynamoDB can provide consistent, single-digit millisecond latency and unlimited storage capacity at a relatively low price. Improper setup causes poor performance and high cost. One particular issue with DynamoDB is poor key choice. Poor key choice though is not the only cause of performance and cost issues. This post discusses why Adaptive Capacity isn’t a silver bullet and won’t compensate for a poor understanding of DynamoDB.

Most people who have used the AWS Command Line Interface (CLI) for more than a few minutes are familiar with the aws configuration command and its ability to save AWS IAM access keys and secret access keys. These are available for AWS IAM users, but most people aren’t aware that they could be using AWS IAM Roles from the CLI.

 Top Ten Tags

AWS (43)   Kinesis (9)   Streams (8)   AWS Console (5)   Go (5)   Analytics (4)   Data (4)   database (4)   Amazon DynamoDB (3)   Amazon Elastic Compute Cloud (EC2) (3)  

All Tags (173)


All data and information provided on this site is for informational purposes only. makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information on this site and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. All information is provided on an as-is basis.

This is a personal weblog. The opinions expressed here represent my own and not those of my employer. My opinions may change over time.