Wednesday, 16 September 2020

DynamoDB Empty

 

Quickly delete all items in a DynamoDB table.

from : https://www.npmjs.com/package/dynamodb-empty

Friday, 19 June 2020

Setting up an Api Gateway Proxy Resource using Cloudformation

Like most other things, solving said problem really is quite straight-forward once you know how. My Googlings where unsuccessful in this area, so I'll detail how here, in case someone out there has the same problem.

The REST API

In the interest of providing a complete, working configuration, I'll include all the required parts. If you already have an API configuration, you'll probably be most interested in the AWS::ApiGateway::Method resource.
At the top of the mountain is a Rest API:
  Api:
    Type: 'AWS::ApiGateway::RestApi'
    Properties:
      Name: MyProxyAPI

The proxy resource

In order to proxy all paths to the API, you need two resources: the root resource, and a catch-all resource. Well, a catch-almost-all anyway, since the catch-all does not catch the root resource. The root resource is created for you, the proxy resource is not:
  Resource:
    Type: 'AWS::ApiGateway::Resource'
    Properties:
      ParentId: !GetAtt Api.RootResourceId
      RestApiId: !Ref Api
      PathPart: '{proxy+}'
As you can see, this resource references the Api.RootResourceId as its parent. The path part {proxy+} is a greedy match for any path. If you wanted to only match requests under e.g. /blog/*, you'd have to define two resources:
  BlogResource:
    Type: 'AWS::ApiGateway::Resource'
    Properties:
      ParentId: !GetAtt Api.RootResourceId
      RestApiId: !Ref Api
      PathPart: 'blog'

  Resource:
    Type: 'AWS::ApiGateway::Resource'
    Properties:
      ParentId: !Ref BlogResource
      RestApiId: !Ref Api
      PathPart: '{proxy+}'

The methods

Next up we'll configure the methods. As I want to proxy everything, I just define one ANY method for each resource.
The root resource is a 1:1 from the root path to the root path on your proxy target. For this example, we're proxying to an imaginary S3 bucket website:
  RootMethod:
    Type: 'AWS::ApiGateway::Method'
    Properties:
      HttpMethod: ANY
      ResourceId: !GetAtt Api.RootResourceId
      RestApiId: !Ref Api
      AuthorizationType: NONE
      Integration:
        IntegrationHttpMethod: ANY
        Type: HTTP_PROXY
        Uri: http://my-imaginary-bucket.s3-website-eu-west-1.amazonaws.com/
        PassthroughBehavior: WHEN_NO_MATCH
        IntegrationResponses:
          - StatusCode: 200
Next up is the proxy resource method, and this is what took me an embarrassing amount of time to figure out.
  ProxyMethod:
    Type: 'AWS::ApiGateway::Method'
    Properties:
      HttpMethod: ANY
      ResourceId: !Ref Resource
      RestApiId: !Ref Api
      AuthorizationType: NONE
      RequestParameters:
        method.request.path.proxy: true
      Integration:
        CacheKeyParameters:
          - 'method.request.path.proxy'
        RequestParameters:
          integration.request.path.proxy: 'method.request.path.proxy'
        IntegrationHttpMethod: ANY
        Type: HTTP_PROXY
        Uri: http://my-imaginary-bucket.s3-website-eu-west-1.amazonaws.com/{proxy}
        PassthroughBehavior: WHEN_NO_MATCH
        IntegrationResponses:
          - StatusCode: 200
Let's discuss the key components of this. First of all, setting the resource path to {proxy+} is not enough to be able to use this in the target URL. You also need to specify RequestParameters to state that it is OK to use the proxy parameter from the path in the integration configuration.
As if that wasn't enough, you also have to inform Cloudformation of how you will access the proxy parameter in your integration request path, by specifying Integration.RequestParameters. It is a map of parameters from the method request to parameters in the integration request.
Those two bits are crucial, because now we can finally use {proxy} to insert the proxied path in our integration uri.

Deployment

In order to use the API you need a deployment. Because the deployment does not have a direct dependency on either of the methods, and because we cannot deploy an API with no methods, we use DependsOn to help Cloudformation figure out the order of things:
  Deployment:
    DependsOn:
      - RootMethod
      - ProxyMethod
    Type: 'AWS::ApiGateway::Deployment'
    Properties:
      RestApiId: !Ref Api
      StageName: dev
Choose a stage name of your liking.

The whole shebang

That's all there is to it. Doesn't look very hard when you know what to do.
AWSTemplateFormatVersion: 2010-09-09
Description: An API that proxies requests to another HTTP endpoint

Resources:
  Api:
    Type: 'AWS::ApiGateway::RestApi'
    Properties:
      Name: SomeProxyApi

  Resource:
    Type: 'AWS::ApiGateway::Resource'
    Properties:
      ParentId: !GetAtt Api.RootResourceId
      RestApiId: !Ref Api
      PathPart: '{proxy+}'

  RootMethod:
    Type: 'AWS::ApiGateway::Method'
    Properties:
      HttpMethod: ANY
      ResourceId: !GetAtt Api.RootResourceId
      RestApiId: !Ref Api
      AuthorizationType: NONE
      Integration:
        IntegrationHttpMethod: ANY
        Type: HTTP_PROXY
        Uri: http://my-imaginary-bucket.s3-website-eu-west-1.amazonaws.com/
        PassthroughBehavior: WHEN_NO_MATCH
        IntegrationResponses:
          - StatusCode: 200

  ProxyMethod:
    Type: 'AWS::ApiGateway::Method'
    Properties:
      HttpMethod: ANY
      ResourceId: !Ref Resource
      RestApiId: !Ref Api
      AuthorizationType: NONE
      RequestParameters:
        method.request.path.proxy: true
      Integration:
        CacheKeyParameters:
          - 'method.request.path.proxy'
        RequestParameters:
          integration.request.path.proxy: 'method.request.path.proxy'
        IntegrationHttpMethod: ANY
        Type: HTTP_PROXY
        Uri: http://my-imaginary-bucket.s3-website-eu-west-1.amazonaws.com/{proxy}
        PassthroughBehavior: WHEN_NO_MATCH
        IntegrationResponses:
          - StatusCode: 200

  Deployment:
    DependsOn:
      - RootMethod
      - ProxyMethod
    Type: 'AWS::ApiGateway::Deployment'
    Properties:
      RestApiId: !Ref Api
      StageName: !Ref StageName

from : https://cjohansen.no/aws-apigw-proxy-cloudformation/#the-whole-shebang

Friday, 29 May 2020

Set Up an SSH Tunnel to the Master Node Using Local Port Forwarding on Linux, Unix, and Mac OS X

To set up an SSH tunnel using local port forwarding in terminal
  1. Open a terminal window. On Mac OS X, choose Applications > Utilities > Terminal. On other Linux distributions, terminal is typically found at Applications > Accessories > Terminal.
  2. Type the following command to open an SSH tunnel on your local machine. This command accesses the ResourceManager web interface by forwarding traffic on local port 8157 (a randomly chosen, unused local port) to port 8088 on the master node's local web server. In the command, replace ~/mykeypair.pem with the location and file name of your .pem file and replace ec2-###-##-##-###.compute-1.amazonaws.com with the master public DNS name of your cluster.
    ssh -i ~/mykeypair.pem -N -L 8157:ec2-###-##-##-###.compute-1.amazonaws.com:8088 hadoop@ec2-###-##-##-###.compute-1.amazonaws.com
    After you issue this command, the terminal remains open and does not return a response.
    Note
    -L signifies the use of local port forwarding which allows you to specify a local port used to forward data to the identified remote port on the master node's local web server.
  3. To open the ResourceManager web interface in your browser, type: http://localhost:8157/ in the address bar.
  4. When you are done working with the web interfaces on the master node, close the terminal windows.

from : https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ssh-tunnel-local.html

Wednesday, 29 April 2020

CLI aws error

I installed AWSCLI using pip install awscli command but still got this error. It got resolved after upgrading the aws with the command pip install --upgrade awscli

Thursday, 23 April 2020

Study Notes - DynamoDB 學習筆記

DynamoDB 設計理想源自於 Amazon 的論文: Dynamo: Amazon’s Highly Available Key-value Store, 2007 ,被稱為是 NoSQL 代表之作 
這篇由 Werner Vogels  (AWS CTO) 寫的 Blog: Amazon DynamoDB – a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications ,提到了 DynamoDB 背後設計的歷史、包含以前的 SimpleDB,文章提到幾個設計的重點:
  • Fast (快)
  • Managed (好)
  • Scalable (好)
  • Durable and Highly Available (好)
  • Flexible (好)
  • Low cost (便宜)
Anyway,以下整理的是 DynamoDB 的重要概念、背後運作的原理。圖文資料都出自官方文件:DynamoDB Developer Guide 。 (有點像在翻譯練習 XD)

核心元件 (Core Components)

經常會跟 MongoDB 比較,概念很類似:
  • Tables:
    • 類似於 RDBMS 的 Table.
    • DynamoDB Table 是一個儲存集合單位。
    • 相當於 MongoDB 的 Collection
  • Items:
    • 每個 Table 可以有多個 Items,相當於 RDBMS 的 Rows。
    • 每個 Items 可包含多個 Attributes
    • 相當於 MongoDB 的 Document
  • Attributes:
    • 每個 Items 由一個或多個 Attributes 組成
    • Attribute 的資料型態有
    • 建立 Attribute 時,注意保留字:Reserved Words 

Primary Key

DynamoDB 支援兩種 Primary Keys:
  • Partition key:
    • 又叫 hash attribute ,指定某一個 attribute 當作 primary key (unique key),稱作 partition key,類似於 RDBMS 的 Unique Key.
    • DynamoDB 利用這個值透過內部的 hash function,然後依據 hash 過的值,決定資料要放在哪個實體的儲存體 (Storage)。這概念類似於 Sharding (分片) 的想法。
    • 基本上,不會有重複的 hash value,也就是不會有重複的 partition key。
  • Partition key and sort key:
    • 使用兩個 attribute 的複合鍵 (composite key): partition key + sort key, 或者稱為 hash key + range key
    • sort key 又叫 range attribute
    • 如果 sort key 存在,那麼 partition key 可以重複
    • hash key + range key 必須是唯一
    • 最常用的例子就是 unique key + date range 這樣的組合。

Secondary Indexes

一個 Table 除了 Primary Key,可以有一個或多個 Secondary Indexes,每個 Table 最多各五個 GSI 跟 LSI:
  • Global Secondary Indexes (GSI): 有自己的 Partition 和 RCU / WCU
  • Local Secondary Indexes (LSI): 與 Table 共用 Partition 的 RCU / WCU

Data Type

  • Scalar Types (純量): number, string, binary, Boolean, and null.
  • Document Types: list and map.
  • Set Types: multiple scalar values, 包含 string set, number set, and binary set.

Read Consistency (讀取一致性模型)

DynamoDB 設計在每個 Region AZ 都可以快速的 Replica 資料,通常會在 1s 以內或更少。DynamoDB 支援兩種一致性模型:
  • Eventually Consistent Reads (最終一致性, ECR): 每秒可以讀 2 次, 每次 4KB 大小,所以可以讀取最大為 8KiB
  • Strongly Consistent Reads (強制一致性, SCR): 每秒可以讀 1 次, 每次 4KB 大小。
這兩個的差異:ECR 不會反映最近完成的寫入操作結果,而 SCR 則一定會反應最近寫入的結果。
因為 DynamoDB 本身在 AWS Region 裡都是跨 AZ,每個 Table 都會存在各地三個副本 (Reclica)。
透過 API 指定用什麼方式,預設是 Eventually Consistent Reads,以下是 Node.js 的範例:
1
2
3
4
5
6
7
8
var params = {
  TableName: 'STRING_VALUE', /* required */
  ConsistentRead: true || false,    // ECR or SCR
};
dynamodb.getItem(params, function(err, data) {
  if (err) console.log(err, err.stack); // an error occurred
  else     console.log(data);           // successful response
});
更多最終一致性模型,參閱 Eventually Consistent 與 Dynamo NWR 模型

Global Tables

— 待整理 —

Read/Write Capacity Mode

Provisioned Mode

DynamoDB 每個 Table 都有讀寫能力單元 (Capacity Units) 的設定,稱作 Read Capacity Units (RCU)Write Capacity Units (WCU).
  • Read Capacity Units (RCU): 每次讀取單位為 4K
    • Strongly Consistent Reads 每秒讀一次
    • Eventually Consistent Reads 每秒讀兩次,也就是每秒 8KB
    • 如果讀寫大小超過 4KB,那麼就會需要額外的 RCU
  • Write Capacity Units (WCU): 每次寫入單位為 1KB,超過大小就會額外消耗 WCU
  • Secondary Indexes 會另外消耗 Capacity Units,有獨立的 RCU / WCU
RCU / WCU 這兩個值會影響效能,也會依據需求收費。
DynamoDB 讀寫的 API:
  • Read:
    • GetItem: 一次取回一個 Item
    • BatchGetItem: 一次操作最多取回 100 Items
  • Write:
    • PutItem / UpdateItem / DeleteItem: 單一個 Item 操作
    • BatchWriteItem: 一次操作,最多 Put / Delete 25 Items
另外,Provisioned Capacity 可以:
  1. 買 Reserved Capacity。
  2. Auto Scaling
  3. On-demand (建議)

On-Demand Mode

AWS re:Invent 2018 年開始支援 On-Demand Mode ,也就是 pay-per-request 的概念。基本的 RCU / WCU 的概念同前段落描述。
以下情境適合使用 On-Demand Mode:
  • 新的 Table,但無法知道需要多少 Read / Write Capacity
  • 有無法預期的請求流量
  • 成本考量,期望用多少,付多少。 (不養機器的概念)
不過這種概念就是把使用的狀況,返回給使用者自行決定,換言之,如果沒有了解 RCU / WCU 的基礎概念,沒有良好的設計,屆時會反映在成本上,而不只是方便維運。

Guidelines for Working with Tables

Partition Behavior of Table

一個 partition 最多提供 3000 RCU / 1000 WCU。建立 Table 時,如果指定 1000 RCU / 500 WCU,那麼需要的 Partition 計算公式如下:
Total partitions for desired performance = (Desired RCU / 3000 RCU) + (Desired WCU / 1000 WCU)
例如:1000 RCU, 500 WCU 需要幾個 Partition?
( 1,000 / 3,000 ) + ( 500 / 1,000 ) = 0.8333 --> 1
所以一個 partition 可以滿足上述的需求。如果 RCU / WCU = 1000,那麼需要的 partition:
( 1,000 / 3,000 ) + ( 1,000 / 1,000 ) = 1.333 --> 2

Partition Split

Partition Split 代表著拆分不同的區塊,儲存資料,每個 Partition 有其基本的讀寫能力與容量。一個 partition 可以儲存 10GiB 的資料,加上 RCU / WCU 的計算,所以以下兩個條件會發生 partition split:
  • 增加 capacity throughput
  • 需要增加 storage 空間

Increased Provisioned Throughput Settings

建立一個 Table ,然後有 5,000 RCU、2,000 WCU,那麼初始的時候就會有 4 個 Partitions,計算公式如下:
( 5000 / 3,000 ) + ( 2,000 / 1,000 ) = 3.6667 --> 4
4 個 partition 將會被配份使用 1,250 RCU (5000/4)、500 WCU (2000/4)。
如果使用者把 RCU 調整成 8,000,那麼既有的四個 partition 就無法滿足需求,DynamoDB 會自動加倍 partition,變成 8 partitions。如下圖:
Increased Provisioned Throughput Settings
最後再把資料平均分配到新的 partition。而每個 partition 的 RCU / WCU 會變成:
  • RCU: 8000 / 8 = 1000
  • WCU: 2000 / 8 = 250

Increased Storage Requirements

當資料量超過一個 partition 大小 10GB 的時候,就會自動長出新的。
上一個例子最後有 8 partitions,如果其中一個超過 10GB
Increased Storage Requirements

Use Burst Capacity Sparingly

因為每個 partition 都有一定的 RCU / WCU,所以也就變成每個 Table 不管使用者要多少,實際上,都會有 buffer,所以如果有瞬間量的需求 (bursts 爆炸),實際上是可以撐一下的。
DynamoDB 保留了五分鐘的 burst 給 RCU / WCU。在這段時間的 R/W 動作,可以非常快速地被消化,基本上會比定義的還要快。
但是不要把 burst 的 RCU / WCU 當成設計的一部份,因為 DynamoDB 會預先使用這些 Capacity 作維護任務。
未來 burst 可能可以讓使用者自行設定。
AWS 官方建議,如果有一些資料存取比較頻繁,建議使用 In Memory 的方式,像是 ElasticCache,或者 DAX。

Limitation

Capacity Unit Sizes 是固定的值,讀 (RCU) 跟寫 (WCU) 都有預設值。而每個 AWS Account / Per Region 也都有一些上限,使用時要注意這些限制。以下資料整理自 Limits in DynamoDB 
  • Capacity Unit Sizes:
    • RCU: 強一致性 (strongly consistent) 讀取,每秒 4KBytes、最終一致性 (eventually consistent) 則是 8KBytes 每秒.
    • WCU: 每秒寫入 1KByte.
  • Limit by Table and Account, 大部分的 Region 如下:
    • Per table – 40,000 RCU, 40,000 WCU
    • Per account – 80,000 RCU, and 80,000 WCU
40,000 RCU = 160MBytes, or 320MBytes

Development with DynamoDB

local development using docker

  • DynamoDB 本身都是透過 Web Service 存取,所以沒有 RDBMS Connection 的概念,所以也不會有 Connection Pool 的問題。
  • 2018 年開始提供了 docker image  給開發者使用:
    • docker run -p 8000:8000 amazon/dynamodb-local
  • AWS 提供 DynamoDB local 版,需要 jre6 以上,使用方式如下:
1
2
wget http://dynamodb-local.s3-website-us-west-2.amazonaws.com/dynamodb_local_latest.tar.gz
java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb
相關資料:

NoSQL Workbench Preview (updated: 2019/09/17)

AWS 總算提供了 NoSQL Workbench  ,主要提供以下功能:
  • Data Modeling
  • Data Visualization
  • Operation Building
目前還在 Preview 階段。

使用時機

AWS 資料儲存有很多方式,不管是 S3 / RDS / DynamoDB / Glacier / ElasticCache / HDFS …. 在 AWS Whitepaper: Storage Options in the AWS Cloud  有很詳細的說明。
不過要快速瞭解的話,下面這張圖 (出自 AWS Summit Series 2016 - Big Data Architectural Patterns and Best Practices on AWS ) 是不錯的參考:

Design Patterns and Best Practice

AWS 官方整理了很多 DynamoDB 的 Design Patterns,很值得研究,整理如下。


from : https://rickhw.github.io/2016/08/17/AWS/Study-Notes-DynamoDB/