Private Connection

By default, services are connected to each other via public IPs. In most cases, it’s advised to use private IPs instead. Private IP addresses let devices connect within the same network, without the need to connect to the public internet. This offers an additional layer of protection, making it more difficult for an external host or user to establish a connection.

The problem

The Private IP access pattern has been build with Infrastructure as a Service (IaaS) in mind (i.e. virtual machines, VPC, etc). This means that this isn’t so straightforward to implement if you’re using Serverless services.

Examples of Serverless compute services within Google Cloud are:

  • App Engine Standard Environment
  • Cloud Functions
  • Cloud Run

The solution

To solve this problem, Google released a network component that is called Serverless VPC Access. This connector makes it possible for you to connect directly to your VPC network from Serverless environments.

We are using the Serverless VPC Access Connector to create a connection to a CloudSQL database via Private IP. Please note that this is not limited to CloudSQL. Once you have setup the Serverless VPC Access Connector, it is possible to connect to any resources that are available within your VPC.

In this blog post, I’ll guide you in setting this up for Google App Engine Standard Environment.

Prerequisites:

  • Authenticated Google Cloud SDK, alternatively Cloud Shell.
  • Enough GCP permissions to create networks and perform deployments.

Step 1: Create an VPC with networks

For the purpose of this blog post, I’m going to create a new VPC with a subnet in europe-west1. Please note that this is not required. You can also reuse your own VPC or the Google Provided Default VPC.

gcloud compute networks create private-cloud-sql \
--subnet-mode custom

In Google Cloud, a VPC is global, but we still need to create subnets to deploy resources in the different Cloud region.

This command creates an subnet in the VPC we created earlier for europe-west1:

gcloud compute networks subnets create private-europe-west1 \
--description=europe-west1\ subnet \
--range=192.168.1.0/24 \
--network=private-cloud-sql \
--region=europe-west1

Step 2: Create a Serverless VPC Access Connector

After we’ve created a VPC with a subnet, we can continue by creating a Serverless VPC Access Connector. We can use the following GCloud command to do this.

gcloud compute networks vpc-access connectors create connector-europe-west1 \
  --network=private-cloud-sql \
  --region=europe-west1 \
  --range=10.8.0.0/28

Note: This command uses some defaults for the number of instances as well as the instance type. Since this could limit your network throughput between your VPC and the Serverless products, it’s recommended to override these properties.

Step 3: Setup Private Access Connection

The CloudSQL instances are deployed in a VPC that Google Manages. To use Cloud SQL instances via private IP in your VPC, we need to setup a VPC peering connection with the ‘servicenetworking’ network.

If this is the first time you interact with service networking, you have to enable the API via:

gcloud services enable servicenetworking.googleapis.com

To setup a VPC Peering connection, we first need to reserve an IP range that can be used. We can do this by executing the following command:

gcloud compute addresses create google-managed-services-private-cloud-sql \
--global \
--purpose=VPC_PEERING \
--prefix-length=16 \
--network=private-cloud-sql

This sets up an auto-generated IP address range based on a prefix, it’s also possible to specify the range yourself.

Now that we have an IP range, we can use this to setup a VPC peering connecting with the ‘service networking’ project:

gcloud services vpc-peerings connect \
--service=servicenetworking.googleapis.com \
--ranges=google-managed-services-private-cloud-sql \
--network=private-cloud-sql

Note: This step is specifically for connecting with Cloud SQL over private IP. It can be skipped if you are not connecting with CloudSQL.

Step 4: Create a Cloud SQL Instance

Now that we created the necessary network infrastructure, we are ready to create a CloudSQL Database instance.

Using this command, we create a new PostgreSQL database with private ip, in the VPC network we provisioned earlier. In this example, I’ve used ‘secretpassword’ as the root password. For production workloads, I’d recommend using Google Secret Manager or IAM authentication.

gcloud beta sql instances create private-postgres \
--region=europe-west1 \
--root-password=secretpassword \
--database-version=POSTGRES_13 \
--no-assign-ip \
--network=private-cloud-sql \
--cpu=2 \
--memory=4GB \
--async

This operation takes a few minutes. After that, a database can be created in the newly provisioned CloudSQL instance.

gcloud sql databases create test --instance private-postgres

Step 5: Configure your App Engine Application

Now that we created the correct networking configuration and the (PostgreSQL) Database has been created, it’s time to update the configuration of our App Engine application.

A hello world application can be cloned here.

To update the configuration of an App Engine Application, we need to change the app.yaml file. This file is located in the root of the project.

The result could look as follows (don’t forget to replace the values for the VPC connector and the database):

runtime: nodejs14

vpc_access_connector:
 name: projects/martijnsgcpproject/locations/europe-west1/connectors/connector-europe-west1
 egress_setting: all-traffic

env_variables:
  PGHOST: 10.39.16.5
  PGUSER: postgres
  PGDATABASE: test
  PGPASSWORD: secretpassword
  PGPORT: 5432

In this example, the database connection information is provided via environment variables. Depending on the library/runtime that you’re using, this may needs to be changed. You provide the username, database, and passwordd settings used in step 4. To get the private IP address reserved for your CloudSQL, you can navigate to the Google Cloud Console and go to the Cloud SQL page.

Step 6: Deploy your App Engine Application

Now we’ve configured our App Engine specification, we can deploy it. Navigate to the root directory and execute:

gcloud app deploy

This will package your app, upload it to GCP and deploy it to the App Engine Service.

After the app has been deployed, you can automatically navigate to the application by executing:

gcloud app browse

The hello world application only has one endpoint. Once triggered, it sets up a connection with the CloudSQL database and executes a SELECT NOW() query.

Once you navigate to the generated link via the gcloud app browse command, you’ll see:

Congrats! Your connection from Google App Engine to a private CloudSQL instance is successful!

Bonus: Cloud Run

While this blog post is about Google App Engine, it’s also possible to use the Serverless VPC Connector with Cloud Run, via:

gcloud run deploy SERVICE_NAME \
--image IMAGE_URL \
--vpc-connector CONNECTOR_NAME

The Service Account that Cloud Run is using must have the following the roles/vpcaccess.serviceAgent role.

Bonus 2: Cloud Functions

Similar to Cloud Run, you can setup a Cloud Function with a Serverless VPC Connector via:

gcloud functions deploy FUNCTION_NAME \
--vpc-connector CONNECTOR_NAME 

The Service Account that Cloud Function is using must have the following the roles/vpcaccess.serviceAgent role.

Conclusion

We’ve just learned how to combine Google Cloud’s Serverless products with private IP instances via the Serverless VPC Access Connector.

The diagram below illustrates everything we have created in this blog post.

This blog shows you how to deploy PrivateBin on Google Cloud Run and Google Cloud Storage, to achieve a cheap and high available deployment for low intensity usage.

(meer…)

To use Google Cloud resources you have to enable the corresponding service API. This is forgotton, easily. Therefore, you’ve likely seen the ‘[service] API has not been used in project [project_number]..’ or ‘Consumer [project_number] should enable ..’ errors, and wondered how to find your project using a project number. Gladly, you can use the gcloud CLI for that.

Gcloud Projects Describe [Project Number]

To find your project, don’t use the cloud console (!), use the CLI:

gcloud projects describe [project_number]

And immediately, you receive all attributes required to update your infrastructure code: project id and name.

createTime: '2019-10-21T12:43:08.071Z'
lifecycleState: ACTIVE
name: your-project-name
parent:
  id: 'your-organization-id'
  type: organization
projectId: your-project-id
projectNumber: '000000000000'

Cloud Console Alternative

The project number is quite hidden in the Cloud Console. You might expect it in the Resource manager view. Sadly, at the time of writing, it’s only available in the Cloud Console Dashboard.

Please, however, don’t browse through all your projects. Save time and trouble by using the following url: https://console.cloud.google.com/home/dashboard?project=[project_number]. This will open the right project for you as well.

Conclusion

Finding project numbers is fast and easy with the gcloud CLI. Use the saved time and energy to resolve actual errors.

Photo by Thijs van der Weide from Pexels

Hilversum, Nederland, 1 september 2021 – Xebia neemt g-company over en versterkt zijn positie als toonaangevende Google Cloud Partner.

In de afgelopen jaren is Binx.io binnen de Xebia organisatie als de cloudboetiek met een Google Cloud Premier Partnership en Authorized Training Partnership.

Met de overname van g-company vergroot Xebia zijn GCP capaciteit en breidt het zijn dienstverlening uit met oplossingen als Google Workspace, waardoor het digitale transformaties nog beter kan faciliteren.

In 2007, toen Google zijn eerste zakelijke oplossing introduceerde, startte David Saris g-company. Als eerste partner van Google in Noord-Europa is g-company met de IT multinational meegegroeid tot een bedrijf met ruim 70 medewerkers in Maleisië, België en Nederland. In samenwerking met Google, Salesforce, Freshworks, monday.com en andere web-based oplossingen biedt g-company applicatieontwikkeling, data, machine learning, moderne infrastructuur en online werkplek diensten aan bedrijven over de hele wereld.

“Om de groei van onze GCP practice te versnellen, was er één organisatie waar ik aan dacht: g-company. Enige tijd geleden benaderde ik David om de mogelijkheden te bespreken. Ik ben verheugd om vandaag tde overname e kunnen melden en dat we onze krachten bundelen als GCP partners.!" – Bart Verlaat, CEO Binx.io.

Xebia is een snelgroeiend IT consultancy bedrijf; een verzameling merken die elk een digitaal domein beslaan, zoals cloud, data, AI, Agile, DevOps, en software consulting. Xebia heeft als doel om wereldwijd een toonaangevende speler te zijn op het gebied van training en consultancy, inclusief Google Cloud en Workspace. Met g-company kan Xebia groeien door extra diensten toe te voegen aan zijn one-stop-shop voor bedrijven over de hele wereld.

"Ik ben erg trots op wat we de afgelopen veertien jaar hebben bereikt, maar wat mij betreft komt de grootste groei nog. Er moeten nog zoveel workloads naar de cloud. Google is daar een belangrijke speler in, maar ik geloof in keuzevrijheid. Met Xebia kunnen we onze klanten dat echt bieden." – David Saris, Founder & CEO g-company.

"Net als Xebia geloven wij in focus. Met zoveel expertise kunnen we onze klanten nu een one-stop-shop ervaring bieden. Van cloud engineering en managed services tot machine learning, multi-cloud, data, en AI, cloud-native software development en security." – Lennart Benoot, Managing Partner g-company.

"Xebia wil het organisaties zo makkelijk mogelijk maken om alles uit de zakelijke producten van Google te halen. Als Premier Google Cloud partner werken we nauw samen met diverse Google productspecialisten. g-company biedt een team van ervaren specialisten en voegt diepgaande kennis van en ervaring met Google Workspace, Salesforce, Freshworks, monday.com en Lumapps toe aan ons portfolio." – Andrew de la Haije, CEO Xebia.

Na de acquisitie behouden g-company en Xebia hun status als Google Cloud Premier Partner en Authorized Google Cloud Training Partner, plus een gedeelde Infrastructure en Workspace Competency. g-company blijft actief onder zijn eigen label en het bestaande management onder leiding van David Saris en Lennart Benoot. De overname zal worden aangeduid door “proudly part of Xebia” toe te voegen aan het bedrijfslogo.

It is not possible to remember all methods and properties from all classes that you write and use.

That is the reason why I like to have type hinting in my IDE. In fact I will spend time to make sure it works. I even wrote a blog post about how to set it up when I started experimenting with a new language.

In this post I will provide you with a couple of tips that help you develop faster. By leveraging type hinting using the AWS Lambda Powertools

LambdaContext

When you are developing AWS Lambda functions you receive an event and a context property. If you need metadata from the context you could read the documentation or use type hinting.

LambdaContext type hinting example

As you can see in the example, the IDE will help you find the correct method or property. But what about the event? The example tells you that we expect a Dict[str, Any] but this depends on how you invoke the Lambda function.

Event Source Data Classes

Most of the time you know how you invoke your Lambda function. For example an API Gateway proxy integration could trigger the function:

API Gateway proxy integration type hinting example

Next to using the data class you could also use a decorator. Then use the type hint in the method signature.

from aws_lambda_powertools.utilities.data_classes import event_source, APIGatewayProxyEvent

@event_source(data_class=APIGatewayProxyEvent)
def lambda_handler(event: APIGatewayProxyEvent, context):
    if 'helloworld' in event.path and event.http_method == 'GET':
        do_something_with(event.body, user)

For a full list see the supported event sources page.

What about my custom events?

You could also invoke a Lambda function with a custom payload. This event is not supported as a data class.

Let say we receive an order with one or more items. By using the parser you can define the event in a model. And you can use that model for type hinting.

from typing import Optional
from aws_lambda_powertools.utilities.parser import event_parser, BaseModel, ValidationError
from aws_lambda_powertools.utilities.typing import LambdaContext

import json

class OrderItem(BaseModel):
    id: int
    quantity: int
    description: str

class Order(BaseModel):
    id: int
    description: str
    items: List[OrderItem] # use nesting models
    optional_field: Optional[str] # this field may or may not be available when parsing

@event_parser(model=Order)
def handler(event: Order, context: LambdaContext):
    print(event.id)
    print(event.description)
    print(event.items)

    order_items = [items for item in event.items]
    ...

But my custom event is in an existing event source

Well in that case you need to supply the event source as an envelope.

from aws_lambda_powertools.utilities.parser import event_parser, parse, BaseModel, envelopes
from aws_lambda_powertools.utilities.typing import LambdaContext

class UserModel(BaseModel):
    username: str
    password1: str
    password2: str

# Same behavior but using our decorator
@event_parser(model=UserModel, envelope=envelopes.EventBridgeEnvelope)
def handler(event: UserModel, context: LambdaContext):
    assert event.password1 == event.password2

Conclusion

If you found this blog useful I recommend reading the official documentation. It’s a small investment that will pay out in nice and clean code.

No need to remember all the methods and properties and what types the expect and return. An IDE can do that for you so that you can focus on the actual business logic.

We have read and selected a few articles and other resources about the latest developments around cloud technology so you don’t have to. Read further and keep yourself up-to-date in five minutes!

Why is there no June 2021 edition?

You probably noticed that we skipped the June 2021 edition. We have been quite busy working on other awesome stuff, for example the Google Cloud Platform User Group Benelux and the “Designing Serverless Applications on AWS” talk for the End2End Live Conference. Watch the recording below!

Where should I run my stuff? Choosing a Google Cloud compute option

Where should you run your workload? It depends…Choosing the right infrastructure options to run your application is critical, both for the success of your application and for the team that is managing and developing it. This post breaks down some of the most important factors that you need to consider when deciding where you should run your stuff!

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/where-should-i-run-my-stuff-choosing-google-cloud-compute-option

Lambda Extensions

The execution environment of a Lambda gets reused when it is executed multiple times. Usually code ignores this fact but a very common optimization is to load settings only the first time your code runs in the execution environment. The extension API, that is available for extension writers, now gives standardized access to life-cycle information of the execution environment. There are many ready-to-use extensions from AWS partners and open source projects, but you can even write your own extensions. An extension is usually just a code “layer” you can add to your Lambda.

Read more:
https://aws.amazon.com/about-aws/whats-new/2021/05/aws-lambda-extensions-now-generally-available/

The ultimate App Engine cheat sheet

App Engine is a fully managed serverless compute option in Google Cloud that you can use to build and deploy low-latency, highly scalable applications. App Engine makes it easy to host and run your applications. It scales them from zero to planet scale without you having to manage infrastructure. App Engine is recommended for a wide variety of applications including web traffic that requires low-latency responses, web frameworks that support routes, HTTP methods, and APIs.

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/ultimate-app-engine-cheat-sheet

StepFunctions now supports EventBridge

Before you had to go through something like Lambda to generate EventBridge events from a StepFunction. Now support has been added to the StepFunctions implementation to do this more efficiently without the need of extra code or infrastructure. This simplifies this kind of serverless solutions greatly!

Read more:
https://aws.amazon.com/about-aws/whats-new/2021/05/aws-step-functions-now-supports-amazon-custom-events-eventbridge/

Maximize your Cloud Run investments with new committed use discounts

One of the key benefits of Cloud Run is that it lets you pay only for what you use, down to 100 millisecond granularity. This is ideal for elastic workloads, notably workloads that can scale to zero or that need instant scaling. However, the traffic on your website or API doesn’t always need this kind of elasticity. Often, customers have a steady stream of requests, or the same daily traffic pattern, resulting in a predictable spend for your Cloud Run resources. Google Cloud is now introducing self-service spend-based committed use discounts for Cloud Run, which let you commit for a year to spending a certain amount on Cloud Run and benefiting from a 17% discount on the amount you committed.

Read more:
https://cloud.google.com/blog/products/serverless/introducing-committed-use-discounts-for-cloud-run

EventBridge supports sharing events between event buses

Using EventBridge is a good way to create a decoupled event-driven architecture. Using routing rules you can now route events to other buses in the same account and region. Using this feature you can either fan out events to different event buses or aggregate events to a single bus.

Read more:
https://aws.amazon.com/about-aws/whats-new/2021/05/amazon-eventbridge-supports-sharing-events-between-event-buses-same-account-region/

Credits: Header image by Brianna Santellan on Unsplash.com

Setup Visual Studio Code as your IDE to build and deploy an AWS SAM stack using Go.

I recently started to experiment with Go and the best way to learn is to play with it and build something. So I decided to build a simple application. During this process I ran into the problem that my IDE did not like having many go modules within one project.

Project layout

With AWS SAM you can build and deploy your stack or via a CI/CD pipeline, the process is the same. First you perform a sam build. This will compile the source code for each AWS::Serverless::Function in the template.yaml. Followed by sam deploy. This will upload the artifacts and deploy the template using AWS CloudFormation.

The problem that I faced was that each function needs a go.mod in the source folder. Visual Studio Code expects a single go.mod file in the project.

Project Structure

As a result imports in the unit test do not get recognized:

Sample code of the unit test without workspace
  • The RequestEvent struct.
  • The handler method.
  • The ResponseEvent struct.

The reason for this is that the IDE has set the GOPATH to the root of the project. Therefor the imports are not correct. By using Workspace Folders you can change this behavior so that each folder has it’s own scope.

From Go 1.16+ the default build mode has switched from GOPATH to Go modules.

To do this you need to create a workspace.code-workspace file. Place it in the root of your project with the following content:

{
    "folders": [
        {
            "path": "functions/api"
        },
        {
            "path": "functions/confirm"
        },
        {
            "path": "functions/ship"
        },
        {
            "path": "functions/stock"
        },
        {
            "path": "."
        }
    ]
}

Now because each Go module lives within it’s own scope the imports are correct. As a result you are able to navigate through the code using the ⌘ click action.

Conclusion

When you deal with many Go modules within one project it is important to configure your IDE. This will improve readability, reduces errors and increases your development speed!

So you want to know "How to Read Firestore Events with Cloud Functions and Golang" ? You’re in the right place! I recently worked on a side project called "Syn" (https://github.com/lucavallin/syn – Old Norse for vision!) which aims at visually monitoring the environment using the Raspberry Pi, Google Cloud and React Native. The Raspberry Pi uses a tool called motion, which takes pictures (with the Pi camera) when movement is detected. The pictures are then uploaded to Cloud Storage, and a Cloud Function listens for events that are triggered when a new object is uploaded to the bucket. When a new picture is uploaded, the function tries to label the image using Vision API and stores the result in Firestore. Firestore triggers an event itself, which I am listening for in a different Cloud Function, which uses IFTTT to notify me when movement has been detected and what Vision API has found in the image. The goal of this blog post is to explain how to parse Firestore events, which are delivered to the function in a rather confusing format!

Shape and colour of the Event struct

The Cloud Function processing new events (= new uploads from the Raspberry Pi, when movement is detected), after labelling the picture with Vision API, creates a new Event object and stores it in the Events collection in Firestore.
The Event struct looks like this:

type Event struct {
    URI     string    `json:"uri" firestore:"uri"`
    Created time.Time `json:"created" firestore:"created"`
    Labels  []Label   `json:"labels" firestore:"labels"`
}

The struct also references an array of Labels. Label is a struct defined as:

type Label struct {
    Description string  `json:"description" firestore:"description"`
    Score       float32 `json:"score" firestore:"score"`
}

This is the result once the information has been persisted to Firestore:

Syn - Firestore

Create a function to listen to Firestore events

Another function, called Notify, listens for events from Firestore (and then notifies the user via IFTTT), which are triggered when new data is added into the database. I have used Terraform to setup the function:

resource "google_cloudfunctions_function" "notify" {
  project               = data.google_project.this.project_id
  region                = "europe-west1"
  name                  = "Notify"
  description           = "Notifies of newly labeled uploads"
  service_account_email = google_service_account.functions.email
  runtime               = "go113"
  ingress_settings      = "ALLOW_INTERNAL_ONLY"
  available_memory_mb   = 128

  entry_point = "Notify"

  source_repository {
    url = "https://source.developers.google.com/projects/${data.google_project.this.project_id}/repos/syn/moveable-aliases/master/paths/functions"
  }

  event_trigger {
    event_type = "providers/cloud.firestore/eventTypes/document.create"
    resource   = "Events/{ids}"
  }

  environment_variables = {
    "IFTTT_WEBHOOK_URL" : var.ifttt_webhook_url
  }
}

The event_trigger block defines the event that the function should listen for. In this case, I am listening for providers/cloud.firestore/eventTypes/document.create events in the Events collection.

What does a "raw" Firestore event look like?

Using fmt.Printf("%+v", event), we can see that the Firestore event object looks like this:

{OldValue:{CreateTime:0001-01-01 00:00:00 +0000 UTC Fields:{Created:{TimestampValue:0001-01-01 00:00:00 +0000 UTC} File:{MapValue:{Fields:{Bucket:{StringValue:} Name:{StringValue:}}}} Labels:{ArrayValue:{Values:[]}}} Name: UpdateTime:0001-01-01 00:00:00 +0000 UTC} Value:{CreateTime:2021-07-27 09:22:03.654255 +0000 UTC Fields:{Created:{TimestampValue:2021-07-27 09:22:01.4 +0000 UTC} File:{MapValue:{Fields:{Bucket:{StringValue:} Name:{StringValue:}}}} Labels:{ArrayValue:{Values:[{MapValue:{Fields:{Description:{StringValue:cat} Score:{DoubleValue:0.8764283061027527}}}} {MapValue:{Fields:{Description:{StringValue:carnivore} Score:{DoubleValue:0.8687784671783447}}}} {MapValue:{Fields:{Description:{StringValue:asphalt} Score:{DoubleValue:0.8434737920761108}}}} {MapValue:{Fields:{Description:{StringValue:felidae} Score:{DoubleValue:0.8221824765205383}}}} {MapValue:{Fields:{Description:{StringValue:road surface} Score:{DoubleValue:0.807261049747467}}}}]}}} Name:projects/cvln-syn/databases/(default)/documents/Events/tVhYbIZBQypHtHzDUabq UpdateTime:2021-07-27 09:22:03.654255 +0000 UTC} UpdateMask:{FieldPaths:[]}}

…which is extremely confusing! I was expecting the event to look exactly like the data I previously stored in the database, but for some reason, this is what a Firebase event looks like. Luckily, the "JSON to Go Struct" IntelliJ IDEA plugin helps making sense of the above:

type FirestoreUpload struct {
    Created struct {
        TimestampValue time.Time `json:"timestampValue"`
    } `json:"created"`
    File struct {
        MapValue struct {
            Fields struct {
                Bucket struct {
                    StringValue string `json:"stringValue"`
                } `json:"bucket"`
                Name struct {
                    StringValue string `json:"stringValue"`
                } `json:"name"`
            } `json:"fields"`
        } `json:"mapValue"`
    } `json:"file"`
    Labels struct {
        ArrayValue struct {
            Values []struct {
                MapValue struct {
                    Fields struct {
                        Description struct {
                            StringValue string `json:"stringValue"`
                        } `json:"description"`
                        Score struct {
                            DoubleValue float64 `json:"doubleValue"`
                        } `json:"score"`
                    } `json:"fields"`
                } `json:"mapValue"`
            } `json:"values"`
        } `json:"arrayValue"`
    } `json:"labels"`
}

While still confusing, at least now I can split up the struct so I can reference the types correctly elsewhere in the application should I need to, for example, loop through the labels.

Cleaning up the event structure

The FirestoreUpload can be split up in order to have named fields rather than anonymous structs. This is useful to be able to reference the correct fields and types elsewhere in the application, for example when looping through the labels.

package events

import (
    "github.com/thoas/go-funk"
    "time"
)

//FirestoreEvent is the payload of a Firestore event
type FirestoreEvent struct {
    OldValue   FirestoreValue `json:"oldValue"`
    Value      FirestoreValue `json:"value"`
    UpdateMask struct {
        FieldPaths []string `json:"fieldPaths"`
    } `json:"updateMask"`
}

// FirestoreValue holds Firestore fields
type FirestoreValue struct {
    CreateTime time.Time       `json:"createTime"`
    Fields     FirestoreUpload `json:"fields"`
    Name       string          `json:"name"`
    UpdateTime time.Time       `json:"updateTime"`
}

// FirestoreUpload represents a Firebase event of a new record in the Upload collection
type FirestoreUpload struct {
    Created Created `json:"created"`
    File    File    `json:"file"`
    Labels  Labels  `json:"labels"`
}

type Created struct {
    TimestampValue time.Time `json:"timestampValue"`
}

type File struct {
    MapValue FileMapValue `json:"mapValue"`
}

type FileMapValue struct {
    Fields FileFields `json:"fields"`
}

type FileFields struct {
    Bucket StringValue `json:"bucket"`
    Name   StringValue `json:"name"`
}

type Labels struct {
    ArrayValue LabelArrayValue `json:"arrayValue"`
}

type LabelArrayValue struct {
    Values []LabelValues `json:"values"`
}

type LabelValues struct {
    MapValue LabelsMapValue `json:"mapValue"`
}

type LabelsMapValue struct {
    Fields LabelFields `json:"fields"`
}

type LabelFields struct {
    Description StringValue `json:"description"`
    Score       DoubleValue `json:"score"`
}

type StringValue struct {
    StringValue string `json:"stringValue"`
}

type DoubleValue struct {
    DoubleValue float64 `json:"doubleValue"`
}

// GetUploadLabels returns the labels of the image as an array of strings
func (e FirestoreEvent) GetUploadLabels() []string {
    return funk.Map(e.Value.Fields.Labels.ArrayValue.Values, func(l LabelValues) string {
        return l.MapValue.Fields.Description.StringValue
    }).([]string)
}

The GetUploadLabels() function is an example of how the FirestoreUpload event object should be accessed. Here I am also using the go-funk package, which adds some extra functional capabilities to Go (but the performance isn’t as good as a "native" loop).

Summary

In this article I explained how to read Firestore events from Cloud Functions listening for them. The examples are written in Golang, but different languages will need to parse the messages in a similar way. Although not handy, this is the current format of Firestore events! Luckily, once you know how to read them, the rest is simple!

Credits: Header image by Luca Cavallin on unsplash.com

Introduction

This article presents a comparison of Cloud Pub/Sub and NATS as message brokers for the distributed applications. We are going to focus on the differences, advantages and disadvantages of both systems.

Cloud Pub/Sub

Cloud Pub/Sub provides messaging and ingestion features for event-driven systems and streaming analytics. The highlights of the tool can be summarized as follows:

  • Scalable, in-order message delivery with pull and push modes
  • Auto-scaling and auto-provisioning with support from zero to hundreds of GB/second
  • Independent quota and billing for publishers and subscribers
  • Global message routing to simplify multi-region systems

Furthermore, Cloud Pub/Sub provides the following benefits over non-Google-managed systems:

  • Synchronous, cross-zone message replication and per-message receipt tracking ensures reliable delivery at any scale
  • Auto-scaling and auto-provisioning with no partitions eliminates planning and ensures workloads are production ready from day one
  • Filtering, dead-letter delivery, and exponential backoff without sacrificing scale help simplify your applications
  • Native Dataflow integration enables reliable, expressive, exactly-once processing and integration of event streams in Java, Python, and SQL.
  • Optional per-key ordering simplifies stateful application logic without sacrificing horizontal scale—no partitions required.
  • Pub/Sub Lite aims to be the lowest cost option for high-volume event ingestion. – Pub/Sub Lite offers zonal storage and puts you in control of capacity management.

Some use cases of Cloud Pub/Sub include:

  • Google’s stream analytics makes data more organized, useful, and accessible from the instant it’s generated. Built on Pub/Sub along with Dataflow and BigQuery, their streaming solution provisions the resources needed to ingest, process, and analyze fluctuating volumes of real-time data for real-time business insights. This abstracted provisioning reduces complexity and makes stream analytics accessible to both data analysts and data engineers.
  • Pub/Sub works as a messaging middleware for traditional service integration or a simple communication medium for modern microservices. Push subscriptions deliver events to serverless webhooks on Cloud Functions, App Engine, Cloud Run, or custom environments on Google Kubernetes Engine or Compute Engine. Low-latency pull delivery is available when exposing webhooks is not an option or for efficient handling of higher throughput streams.

Features

Cloud Pub/Sub offers the following features:

  • At-least-once delivery: Synchronous, cross-zone message replication and per-message receipt tracking ensures at-least-once delivery at any scale.
  • Open: Open APIs and client libraries in seven languages support cross-cloud and hybrid deployments.
  • Exactly-once processing: Dataflow supports reliable, expressive, exactly-once processing of Pub/Sub streams.
  • No provisioning, auto-everything: Pub/Sub does not have shards or partitions. Just set your quota, publish, and consume.
  • Compliance and security: Pub/Sub is a HIPAA-compliant service, offering fine-grained access controls and end-to-end encryption.
  • Google Cloud–native integrations: Take advantage of integrations with multiple services, such as Cloud Storage and Gmail update events and Cloud Functions for serverless event-driven computing.
  • Third-party and OSS integrations: Pub/Sub provides third-party integrations with Splunk and Datadog for logs along with Striim and Informatica for data integration. Additionally, OSS integrations are available through Confluent Cloud for Apache Kafka and Knative Eventing for Kubernetes-based serverless workloads.
  • Seek and replay: Rewind your backlog to any point in time or a snapshot, giving the ability to reprocess the messages. Fast forward to discard outdated data.
  • Dead letter topics: Dead letter topics allow for messages unable to be processed by subscriber applications to be put aside for offline examination and debugging so that other messages can be processed without delay.
  • Filtering: Pub/Sub can filter messages based upon attributes in order to reduce delivery volumes to subscribers.

Pricing

Cloud Pub/Sub is free up to 10GB/month of traffic, and above this threshold, a flat-rate of $40.00/TB/month applies.

Summary of Cloud Pub/Sub

Cloud Pub/Sub is the default choice for cloud-native applications running on Google Cloud. Overall, the pros and cons of the tool can be summarized with the following points:

Main advantages

  • Google-managed. There is no complex setup or configuration needed to use it.
  • Integrations. Cloud Pub/Sub integrates seamlessly with other Google Cloud services, for example Kubernetes Engine.
  • Secure. End-to-end encryption enabled by default and built-in HIPAA compliance.

Main disadvantages

Refer to https://cloud.google.com/pubsub/docs/overview for further information.

NATS

NATS is a message broker that enables applications to securely communicate across any combination of cloud vendors, on-premise, edge, web and mobile, and devices. NATS consists of a family of open source products that are tightly integrated but can be deployed easily and independently. NATS facilitates building distributed applications and it provides Client APIs are in over 40 languages and frameworks including Go, Java, JavaScript/TypeScript, Python, Ruby, Rust, C#, C, and NGINX. Furthermore, real time data streaming, highly resilient data storage and flexible data retrieval are supported through JetStream , which is built into the NATS server.

The highlights of the tool can be summarized as follows:

  • With flexible deployments models using clusters, superclusters, and leaf nodes, optimize communications for your unique deployment. The NATS Adaptive Edge Architecture allows for a perfect fit for unique needs to connect devices, edge, cloud or hybrid deployments.
  • With true multi-tenancy, securely isolate and share your data to fully meet your business needs, mitigating risk and achieving faster time to value. Security is bifurcated from topology, so you can connect anywhere in a deployment and NATS will do the right thing.
  • With the ability to process millions of messages a second per server, you’ll find unparalleled efficiency with NATS. Save money by minimizing cloud costs with reduced compute and network usage for streams, services, and eventing.
  • NATS self-heals and can scale up, down, or handle topology changes anytime with zero downtime to your system. Clients require zero awareness of NATS topology allowing you future proof your system to meet your needs of today and tomorrow.

Some use cases of NATS include:

  • Cloud Messaging
    • Services (microservices, service mesh)
    • Event/Data Streaming (observability, analytics, ML/AI)
  • Command and Control
    • IoT and Edge
    • Telemetry / Sensor Data / Command and Control
  • Augmenting or Replacing Legacy Messaging Systems

Features

NATS offers the following features:

  • Language and Platform Coverage: Core NATS: 48 known client types, 11 supported by maintainers, 18 contributed by the community. NATS Streaming: 7 client types supported by maintainers, 4 contributed by the community. NATS servers can be compiled on architectures supported by Golang. NATS provides binary distributions.
  • Built-in Patterns: Streams and Services through built-in publish/subscribe, request/reply, and load-balanced queue subscriber patterns. Dynamic request permissioning and request subject obfuscation is supported.
  • Delivery Guarantees: At most once, at least once, and exactly once is available in JetStream.
  • Multi-tenancy and Sharing: NATS supports true multi-tenancy and decentralized security through accounts and defining shared streams and services.
  • AuthN: NATS supports TLS, NATS credentials, NKEYS (NATS ED25519 keys), username and password, or simple token.
  • AuthZ: Account limits including number of connections, message size, number of imports and exports. User-level publish and subscribe permissions, connection restrictions, CIDR address restrictions, and time of day restrictions.
  • Message Retention and Persistence: Supports memory, file, and database persistence. Messages can be replayed by time, count, or sequence number, and durable subscriptions are supported. With NATS streaming, scripts can archive old log segments to cold storage.
  • High Availability and Fault Tolerance: Core NATS supports full mesh clustering with self-healing features to provide high availability to clients. NATS streaming has warm failover backup servers with two modes (FT and full clustering). JetStream supports horizontal scalability with built-in mirroring.
  • Deployment: The NATS network element (server) is a small static binary that can be deployed anywhere from large instances in the cloud to resource constrained devices like a Raspberry PI. NATS supports the Adaptive Edge architecture which allows for large, flexible deployments. Single servers, leaf nodes, clusters, and superclusters (cluster of clusters) can be combined in any fashion for an extremely flexible deployment amenable to cloud, on-premise, edge and IoT. Clients are unaware of topology and can connect to any NATS server in a deployment.
  • Monitoring: NATS supports exporting monitoring data to Prometheus and has Grafana dashboards to monitor and configure alerts. There are also development monitoring tools such as nats-top. Robust side car deployment or a simple connect-and-view model with NATS surveyor is supported.
  • Management: NATS separates operations from security. User and Account management in a deployment may be decentralized and managed through a CLI. Server (network element) configuration is separated from security with a command line and configuration file which can be reloaded with changes at runtime.
  • Integrations: NATS supports WebSockets, a Kafka bridge, an IBM MQ Bridge, a Redis Connector, Apache Spark, Apache Flink, CoreOS, Elastic, Elasticsearch, Prometheus, Telegraf, Logrus, Fluent Bit, Fluentd, OpenFAAS, HTTP, and MQTT, and more.

Pricing

There are no fees involved with deploying NATS, however, the costs of the instances running the system and related maintenance (and related time-cost) must be taken into account. The final cost depends on the number and type of instances chosen to run NATS.

Summary of NATS

NATS is a CNCF-regnonized message broker.
Overall, the pros and cons of the tool can be summarized with the following points:

Main advantages

  • It supports more patterns. Streams and Services through built-in publish/subscribe, request/reply, and load-balanced queue subscriber patterns. Dynamic request permissioning and request subject obfuscation is supported.

Main disadvantages

  • User-managed. While NATS can be deployed as a Google Cloud Marketplace solution, more complex scenarios like multi-regional clusters require an extensive amount of user-supplied configuration, both for NATS itself and related resources (for example, firewall rules). Using the Helm-charts provided by NATS to run it on Kubernetes however, facilitates many aspects of the process (see https://docs.nats.io/nats-on-kubernetes/nats-kubernetes)

Refer to https://docs.nats.io/ for further information.

Conclusion

Cloud Pub/Sub and NATS are both excellent, battle-tested message brokers. Whether you pick one or the other, it’s often up to your requirements and preferences. Personally, I would always recommend Cloud Pub/Sub where the requirements allow for it, because of a high degree of integration with other Google Cloud products and because, being managed by Google, Cloud Pub/Sub frees engineers from the complex and time consuming process of setting up and maintaining a third-party solution.

Credits: Header image by Luca Cavallin on unsplash.com

I’ve been using Terraform for over two years now, and from the start, I’ve found the need for proper input validation. Although Terraform has added variable types with HCL 2.0 and even input variable validation rules in Terraform CLI v0.13.0, it still appears challenging.

Why input validation?

There are several reasons why input validation is a good idea, but the most important one is saving time!
It has often happened that a plan output looked good to me, and Terraform itself found no errors, but when I ran apply, the code would still break. Not all providers seem to tell you in advance that a specific combination of settings is not correct. It can also happen that the resource’s name does not validate, but this won’t detect during the plan stage. Especially when deploying GKE clusters or services like Composer on Google Cloud, you could be waiting for more than 30 minutes before these kinds of errors pop up. And then you can wait another 30 minutes for the resources to be deleted again.
You see how being notified about these setting errors in advance can help you save a lot of time.

Using custom assertions

One of the first things we implemented in our custom modules is the use of assertions. Making sure Terraform would show a proper error message came with a challenge, as there is no predefined function for this. On top of that, adding custom checks to break out of the code is not supported by HCL.

example:

# variables.tf
variable "prefix" {
  description = "Company naming prefix, ensures uniqueness of bucket names"
  type        = string
  default     = "binx"
}
variable "project" {
  description = "Company project name."
  type        = string
  default     = "blog"
}
variable "environment" {
  description = "Company environment for which the resources are created (e.g. dev, tst, acc, prd, all)."
  type        = string
  default     = "dev"
}
variable "purpose" {
  description = "Bucket purpose, will be used as part of the bucket name"
  type        = string
}

# main.tf
locals {
  bucket_name = format("%s-%s-%s-%s", var.prefix, var.project, var.environment, var.purpose)
}

resource "google_storage_bucket" "demo" {
  provider = google-beta
  name     = local.bucket_name
}

When you run terraform plan on this example code, and enter a purpose with invalid chars, you will get the following output:

var.purpose
  Bucket purpose, will be used as part of the bucket name
  Enter a value: test^&asd

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_storage_bucket.demo will be created
  + resource "google_storage_bucket" "demo" {
      + bucket_policy_only          = (known after apply)
      + force_destroy               = false
      + id                          = (known after apply)
      + location                    = "US"
      + name                        = "binx-blog-dev-test^&asd"
      + project                     = (known after apply)
      + self_link                   = (known after apply)
      + storage_class               = "STANDARD"
      + uniform_bucket_level_access = (known after apply)
      + url                         = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

As you can see, Terraform does not check the characters in the bucket name and will let you think everything is fine. However, if you try to apply this code, you will be given an error during the bucket creation telling you the name is invalid:

google_storage_bucket.demo: Creating...
╷
│ Error: googleapi: Error 400: Invalid bucket name: 'binx-blog-dev-test^&asd', invalid
│
│   with google_storage_bucket.demo,
│   on test.tf line 30, in resource "google_storage_bucket" "demo":
│   30: resource "google_storage_bucket" "demo" {
│
╵

We really want to detect this during the plan phase to tell the developer to fix his code before merging it into master for rollout.
So how did we solve this problem?

It appears that we can abuse the file function to show a proper error message when our checks fail. The nice thing about this function is that it will show you an error message if the file you’re trying to load does not exist. We can use this to our advantage by providing an error description as the file name.

example:

# main.tf
locals {
  regex_bucket_name = "(([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])\\.)*([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])" # See https://cloud.google.com/storage/docs/naming-buckets

  bucket_name       = format("%s-%s-%s-%s", var.prefix, var.project, var.environment, var.purpose)
  bucket_name_check = length(regexall("^${local.regex_bucket_name}$", local.bucket_name)) == 0 ? file(format("Bucket [%s]'s generated name [%s] does not match regex ^%s$", var.purpose, local.bucket_name, local.regex_bucket_name)) : "ok"
}

resource "google_storage_bucket" "demo" {
  provider = google-beta
  name     = local.bucket_name
}

Using the example above, when we run Terraform plan with the same purpose input as before, we’ll get the following message:

│ Error: Invalid function argument
│
│   on main.tf line 5, in locals:
│    5:   bucket_name_check = length(regexall("^${local.regex_bucket_name}$", local.bucket_name)) == 0 ? file(format("Bucket [%s]'s generated name [%s] does not match regex ^%s$", var.purpose, local.bucket_name, local.regex_bucket_name)) : "ok"
│     ├────────────────
│     │ local.bucket_name is "binx-blog-dev-test^&asd"
│     │ local.regex_bucket_name is "(([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])\\.)*([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])"
│     │ var.purpose is "test^&asd"
│
│ Invalid value for "path" parameter: no file exists at Bucket [test^&asd]'s generated name [binx-blog-dev-test^&asd] does not match regex ^(([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])\.)*([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])$; this function works only with files that are distributed as part of the configuration source code, so if this file will be created by a resource in this configuration you must instead obtain this result
│ from an attribute of that resource.

As you can see, Terraform will try to load a file named:

Bucket [test^&asdf]'s generated name [lot-blog-dev-test^&asdf] does not match regex:
^(([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])\.)*([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])$

This filename is actually the error message we want to show. Obviously, this file does not exist, so it will fail and display this message on the console.

Nice! So we have a way to tell the developer in advance that the provided input is not valid. But how can we make the message even more apparent? We can use newlines!

Check out the following example:

# main.tf
locals {
  regex_bucket_name = "(([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])\\.)*([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])" # See https://cloud.google.com/storage/docs/naming-buckets

  assert_head = "\n\n-------------------------- /!\\ CUSTOM ASSERTION FAILED /!\\ --------------------------\n\n"
  assert_foot = "\n\n-------------------------- /!\\ ^^^^^^^^^^^^^^^^^^^^^^^ /!\\ --------------------------\n\n"

  bucket_name       = format("%s-%s-%s-%s", var.prefix, var.project, var.environment, var.purpose)
  bucket_name_check = length(regexall("^${local.regex_bucket_name}$", local.bucket_name)) == 0 ? file(format("%sBucket [%s]'s generated name [%s] does not match regex:\n^%s$%s", local.assert_head, var.purpose, local.bucket_name, local.regex_bucket_name, local.assert_foot)) : "ok"
}

resource "google_storage_bucket" "demo" {
  provider = google-beta
  name     = local.bucket_name
}

We will now get the following error output:

│ Error: Invalid function argument
│
│   on main.tf line 8, in locals:
│    8:   bucket_name_check = length(regexall("^${local.regex_bucket_name}$", local.bucket_name)) == 0 ? file(format("%sBucket [%s]'s generated name [%s] does not match regex:\n^%s$%s", local.assert_head, var.purpose, local.bucket_name, local.regex_bucket_name, local.assert_foot)) : "ok"
│     ├────────────────
│     │ local.assert_foot is "\n\n-------------------------- /!\\ ^^^^^^^^^^^^^^^^^^^^^^^ /!\\ --------------------------\n\n"
│     │ local.assert_head is "\n\n-------------------------- /!\\ CUSTOM ASSERTION FAILED /!\\ --------------------------\n\n"
│     │ local.bucket_name is "binx-blog-dev-test^&asd"
│     │ local.regex_bucket_name is "(([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])\\.)*([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])"
│     │ var.purpose is "test^&asd"
│
│ Invalid value for "path" parameter: no file exists at
│
│ -------------------------- /!\ CUSTOM ASSERTION FAILED /!\ --------------------------
│
│ Bucket [test^&asd]'s generated name [binx-blog-dev-test^&asd] does not match regex:
│ ^(([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])\.)*([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])$
│
│ -------------------------- /!\ ^^^^^^^^^^^^^^^^^^^^^^^ /!\ --------------------------
│
│ ; this function works only with files that are distributed as part of the configuration source code, so if this file will be created by a resource in this configuration you must instead obtain this result from an attribute of that resource.

Conlusion

As you can see, using the file function to generate assertions opens up many possibilities when it comes to input validation. You can see even more examples on how to use this in our own built Terraform modules found on the Terraform Registry. Part of each of these modules is an asserts.tf file, which defines assertions using the file method as described above.

I still think it would be nice if Terraform would create an actual function for us that we can use to generate custom errors. The workaround I showed in this blog post can easily confuse you, as Terraform tells you no file exists while you’re not trying to actually read any files.

You can search across log groups by setting up a correlation id on your log lines. You can then use CloudWatch Logs Insights to query all related log messages.

If you are dealing with different lambda functions in a project, you might not have a logging system yet. Finding and troubleshooting using Cloudwatch Logs could become cumbersome. Especially if you try to track a flow through the different log groups.

In CloudWatch Logs Insights you can execute queries across log groups. In our case we want to collect related log lines across log groups. To achieve this we need to annotate the log lines with a correlation_id that ties them together.

Sample application

We will use the following sample application to show how to setup a correlation id on your log lines. The application looks as followed:

  1. API Gateway that is invoking an AWS Lambda function
  2. A api Lambda function places the body in a SQS Queue
  3. A callback Lambda function is receiving messages from the SQS Queue
Architecture of the Sample Application

Every time that an API call executes, the apifunction is invoked. We can annotate the log lines by using the Lambda Powertools for Python.

from aws_lambda_powertools import Logger
from aws_lambda_powertools.logging import correlation_paths
from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.utilities.data_classes import (
    event_source,
    APIGatewayProxyEvent,
)
logger = Logger(service="api")

@event_source(data_class=APIGatewayProxyEvent)
@logger.inject_lambda_context(correlation_id_path=correlation_paths.API_GATEWAY_REST)
def lambda_handler(event: APIGatewayProxyEvent, context: LambdaContext) -> dict:
    logger.info("This message now has a correlation_id set")

We take the requestContext.requestId from the incoming event. We can use that as a correlation_id. (This is also returned to the caller as a x-amzn-requestid header by default by API Gateway.)

To be able to search across log groups using the same correlation id. We need to pass it along with the message as an attribute when we send it to SQS.

    response = client.send_message(
        QueueUrl=os.environ.get("QUEUE_URL"),
        MessageBody=json.dumps(event.body),
        MessageAttributes={
            "correlation_id": {
                "DataType": "String",
                "StringValue": event.request_context.request_id,
            }
        },
    )

I used a correlation_id attribute to pass the correlation id with the SQS message.

The queue will trigger the callback function with a batch of messages. You need to loop through the messages in the batch. Process them and remove them from the queue.
Or you use the sqs_batch_processor from the powertools.

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.utilities.batch import sqs_batch_processor
from aws_lambda_powertools.utilities.data_classes.sqs_event import SQSRecord
logger = Logger(service="callback")

def record_handler(record: dict) -> None:
    record = SQSRecord(record)
    # Get the correlation id from the message attributes
    correlation_id = record.message_attributes["correlation_id"].string_value
    logger.set_correlation_id(correlation_id)
    logger.info(f"Processing message with {correlation_id} as correlation_id")

@sqs_batch_processor(record_handler=record_handler)
def lambda_handler(event: dict, context: LambdaContext) -> None:
    logger.set_correlation_id(None)
    logger.info(f"Received a SQSEvent with {len(list(event.records))} records")

The sqs_batch_processor decorator will call the record_handler method for each record received. No need for you to handle the deletion of messages.

In the end we do set the correlation id to None. We do this because the rest of the log lines are not related to the message received.

Show me!

If you call the API using a simple curl command, you can see the correlation_id (Note: this is also in the headers)

initial request using curl

So imagine that something went wrong. We know the correlation_id so we will use CloudWatch Logs Insights:

fields level, service, message
| filter correlation_id="263021ca-fb82-41ff-96ad-2e933aadbe0a"
| sort @timestamp desc
Sample query of AWS CloudWatch Logs Insights

As you can see there are 3 results 2 lines originated from the api service and 1 from the callback service. If have an interest in how to create queries you can read the CloudWatch Logs Insights query syntax.

Conclusion

The Lambda Powertools for Python make it easy
to add a correlation_id. By using the structured logs from the powertools you can use each property in the log line to
query on. It has a lot more utilities, so I encourage go check it out!

Not a python developer? There is a Java
version and there are some initiatives on the roadmap
to add other languages as well.

Xebia, een wereldwijd opererend IT-consultancybedrijf, kondigt de overname aan van Oblivion. Hiermee vergroot Xebia per direct zijn expertise op het gebied van Amazon Web Services (AWS).

Oblivion was één van de eerste adviesbureaus in de Benelux die Amazon Web Services omarmde en leverde honderden projecten op bij toonaangevende bedrijven in uiteenlopende sectoren, zoals Aegon, ABN AMRO, Wehkamp, PostNL en DSM. Oblivion beschikt over bijzonder veel AWS-kennis en -executiekracht. AWS benoemde de organisatie in 2018 tot Premier Consulting Partner en bekroonde hen met de AWS Migration-, DevOps-, Financial Services- en IoT-competentie.

Xebia helpt bedrijven over de hele wereld om digitaal te transformeren door hoogwaardige cloud-, data-, AI-, Agile-, DevOps- en softwareconsultancy te leveren. De snelgroeiende groep heeft verschillende succesvolle merken gelanceerd, waaronder cloudexpert Binx.io en data- en AI-specialist GoDataDriven. Met de overname van Oblivion onderstreept Xebia zijn focus op clouddiensten en blijft het bedrijf zich, geheel in lijn met hun strategie, uitbreiden.

"Xebia’s cloudambities zijn immens. Als Advanced AWS Consulting en Training Partner hebben we een solide relatie opgebouwd met AWS. Naast meer focus brengt Oblivion de verbreding en diepgang die Xebia nodig heeft om zelfs de meest uitdagende cloud projecten tot een goed einde te brengen. Niet alleen voor onze klanten in de Benelux maar ook ver daarbuiten," aldus Andrew de la Haije, CEO Xebia.

"We zijn erg enthousiast over de kansen die voor ons liggen. Door samen te werken met Xebia komen we weer een stap dichter bij ons doel: een leidende positie verwerven als één van Europa’s toonaangevende cloud dienstverleners. Naast efficiencyvoordelen stelt deze samenwerking ons ook in staat om binnen onze ‘AWS-only’-strategie extra diensten aan te bieden. Dit alles vanuit de down-to-earth-aanpak die onze klanten van ons kennen en waarderen,” aldus Edwin van Nuil, CEO van Oblivion.

"Oblivion biedt onze klanten een duidelijk aanspreekpunt voor alles wat met AWS te maken heeft. Voor mij zijn zij de meest ervaren AWS-partner in de Benelux, die elk onderdeel van Amazon Web Services echt door en door kent. In de korte tijd dat we nu samenwerken, hebben onze klanten de toegevoegde waarde van hun expertise al kunnen ervaren," voegt Bart Verlaat, CEO van Binx.io, toe.

"Wij geloven, net als Xebia, in focus. Met zoveel expertise binnen de groep kunnen we onze klanten nu een ‘one-stop-shop’-ervaring bieden. Van cloudengineering en managed services, tot machine learning, multicloud oplossingen, data en AI, cloud-native software development en security," besluit Eric Carbijn, mede-oprichter van Oblivion.

● Na de overname zijn Oblivion en Xebia Premier AWS Partner en Authorized Training Partner, met onder meer een Machine Learning, DevOps, Migration, en Financial Services Competency en de Solution Provider-status.

● Oblivion blijft onder eigen label en onder het huidige management van Edwin van Nuil en Eric Carbijn opereren.

● De overname zal worden aangeduid door de toevoeging van "proudly part of Xebia" aan het bedrijfslogo.

I have recently worked on a project where I needed to configure a Helm release with secrets hard-coded in Terraform. With Cloud KMS, I could encrypt the secrets so that they could safely be committed to git. In this article, I am going to show you how the process works.

Setup Cloud KMS

Since my project is already up and running, all I had to do was to create a Cloud KMS keyring and crypto key that will be used for encrypting and decrypting secrets. This can be done via Terraform with a few resources:

data "google_project" "secrets" {
  project_id = "secrets"
}

resource "google_kms_key_ring" "this" {
  project  = data.google_project.secrets.project_id
  name     = "default"
  location = "europe-west4"
}

resource "google_kms_crypto_key" "this" {
  name     = "default"
  key_ring = google_kms_key_ring.this.self_link
}

resource "google_project_iam_member" "this" {
  project = data.google_project.secrets.project_id
  role    = "roles/cloudkms.cryptoKeyEncrypter"
  member  = "group:encrypter@example.com"
}

In this example, I am also assigning the cloudkms.cryptoKeyEncrypter role to an imaginary encrypter@example.com group so that members can encrypt new secrets.

Encrypt the secrets

I then used this command to encrypt a secret for use in Terraform via Cloud KMS:

echo -n <your-secret> | gcloud kms encrypt --project <project-name> --location <region> --keyring default --key default --plaintext-file - --ciphertext-file - | base64

To encrypt a whole file instead, you can run:

cat <path-to-your-secret> | gcloud kms encrypt --project <project-name> --location <region> --keyring default --key default --plaintext-file - --ciphertext-file - | base64
  • <your-secret> is the string or file you want to encrypt
  • <project-name> is the project whose KMS keys should be used to encrypt the secret.
  • <region> is the region in which the KMS keyring is configured.

If you are working on macOS, you can append | pbcopy to the command so the resulting output will be added automatically to the clipboard.

Finally, an encrypted string looks like this:

YmlueGlzYXdlc29tZWJpbnhpc2F3ZXNvbWViaW54aXNhd2Vzb21lYmlueGlzYXdlc29tZWJpbnhpc2F3ZXNvbWViaW54aXNhd2Vzb21lYmlueGlzYXdlc29tZQ

Configure the Cloud KMS data source

The encrypted string is not stored anywhere. What I did instead, is to use the Cloud KMS data source to decrypt it on-the-fly.

data "google_kms_secret" "secret_key" {
  crypto_key = data.google_kms_crypto_key.this.self_link
  ciphertext = "YmlueGlzYXdlc29tZWJpbnhpc2F3ZXNvbWViaW54aXNhd2Vzb21lYmlueGlzYXdlc29tZWJpbnhpc2F3ZXNvbWViaW54aXNhd2Vzb21lYmlueGlzYXdlc29tZQ"
}

The decrypted value can be referenced with:

data.google_kms_secret.secret_key.plaintext

Terraform will decrypt the string automagically and replace it with the actual value where it is referenced.

Summary

In this short post I have explained how I used Cloud KMS to encrypt strings that can be safely committed to git and used in Terraform. If you have multiple environments that you need to support, remember that KMS keys are different for each project, therefore you will need to encrypt the same string again, once for each project. If you need some input on how to organize a multi-project Terraform repository, have a look at my recent article How to use Terraform workspaces to manage environment-based configuration
.

Credits: Header image by Luca Cavallin on nylavak.com

Anthos Service Mesh is a suite of tools that helps you monitor and manage a reliable service mesh on-premises or on Google Cloud. I recently tested it as an alternative to an unmanaged Istio installation and I was surprised at how much easier Anthos makes it to deploy a service mesh on Kubernetes clusters.
In this article, I am going to explain step-by-step how I deployed a multi-cluster, multi-region service mesh using Anthos Service Mesh. During my proof of concept I read the documentation at https://cloud.google.com/service-mesh/docs/install, but none of the guides covered exactly my requirements, which are:

  • Multi-cluster, multi-region service mesh
  • Google-managed Istio control plane (for added resiliency, and to minimize my effort)
  • Google-managed CA certificates for Istio mTLS

Deploy the GKE clusters

Deploy the two GKE clusters. I called them asm-a and asm-b (easier to remember) and deployed them in two different regions (us-west2-a and us-central1-a). Because Anthos Service Mesh requires nodes to have at least 4 vCPUs (and a few more requirements, see the complete list at): https://cloud.google.com/service-mesh/docs/scripted-install/asm-onboarding), use at least the e2-standard-4 machines.

As preparation work, store the Google Cloud Project ID in an environment variable so that the remaining commands can be copied and pasted directly.

export PROJECT_ID=$(gcloud info --format='value(config.project)')

Then, to deploy the clusters, run:

gcloud container clusters create asm-a --zone us-west2-a --machine-type "e2-standard-4" --disk-size "100" --num-nodes "2" --workload-pool=${PROJECT_ID}.svc.id.goog --async

gcloud container clusters create asm-b --zone us-central1-a --machine-type "e2-standard-4" --disk-size "100" --num-nodes "2" --workload-pool=${PROJECT_ID}.svc.id.goog --async

The commands are also enabling Workload Identity, which you can read more about at: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity.

Fetch the credentials to the clusters

Once the clusters have been created, fetch the credentials needed to connect to the them via kubectl. Use the following commands:

gcloud container clusters get-credentials asm-a --zone us-west2-a --project ${PROJECT_ID}
gcloud container clusters get-credentials asm-b --zone us-central1-a --project ${PROJECT_ID}

Easily switch kubectl context with kubectx

kubectx makes it easy to switch between clusters and namespaces in kubectl (also known as context) by creating a memorable alias for them (in this case, asma and asmb). Learn more about the tool at: https://github.com/ahmetb/kubectx.

kubectx asma=gke_${PROJECT_ID}_us-west2-a_asm-a
kubectx asmb=gke_${PROJECT_ID}_us-central1-a_asm-b

Set the Mesh ID label for the clusters

Set the mesh_id label on the clusters before installing Anthos Service Mesh, which is needed by Anthos to identify which clusters belong to which mesh. The mesh_id is always in the format proj-<your-project-number>, and the project number for the project can be found by running:

gcloud projects list

Use these commands to create the mesh_id label on both clusters (replace <your-project-number> with the project number found with the previous command:

export MESH_ID="proj-<your-project-number>"
gcloud container clusters update asm-a --region us-west2-a --project=${PROJECT_ID} --update-labels=mesh_id=${MESH_ID}

gcloud container clusters update asm-b --region us-central1-a --project=${PROJECT_ID} --update-labels=mesh_id=${MESH_ID}

Enable StackDriver

Enable StackDriver on the clusters to be able to see logs, should anything go wrong during the setup!

gcloud container clusters update asm-a --region us-west2-a --project=${PROJECT_ID} --enable-stackdriver-kubernetes

gcloud container clusters update asm-b --region us-central1-a --project=${PROJECT_ID} --enable-stackdriver-kubernetes

Create firewall rules for cross-region communication

The clusters live in different regions, therefore a new firewall rule must be created to allow communication between them and their pods. Bash frenzy incoming!

ASMA_POD_CIDR=$(gcloud container clusters describe asm-a --zone us-west2-a --format=json | jq -r '.clusterIpv4Cidr')
ASMB_POD_CIDR=$(gcloud container clusters describe asm-b --zone us-central1-a --format=json | jq -r '.clusterIpv4Cidr')
ASMA_PRIMARY_CIDR=$(gcloud compute networks subnets describe default --region=us-west2 --format=json | jq -r '.ipCidrRange')
ASMB_PRIMARY_CIDR=$(gcloud compute networks subnets describe default --region=us-central1 --format=json | jq -r '.ipCidrRange')
ALL_CLUSTER_CIDRS=$ASMA_POD_CIDR,$ASMB_POD_CIDR,$ASMA_PRIMARY_CIDR,$ASMB_PRIMARY_CIDR

gcloud compute firewall-rules create asm-multicluster-rule \
    --allow=tcp,udp,icmp,esp,ah,sctp \
    --direction=INGRESS \
    --priority=900 \
    --source-ranges="${ALL_CLUSTER_CIDRS}" \
    --target-tags="${ALL_CLUSTER_NETTAGS}" --quiet

Install Anthos Service Mesh

First, install the required local tools as explained here: https://cloud.google.com/service-mesh/docs/scripted-install/asm-onboarding#installing_required_tools.

The install_asm tool will install Anthos Service Mesh on the clusters. Pass these options to fulfill the initial requirements:

  • --managed: Google-managed Istio control plane
  • --ca mesh_ca: Google-managed CA certificates for Istio mTLS
  • --enable_registration: automatically registers the clusters with Anthos (it can also be done manually later)
  • --enable_all: all Google APIs required by the installation will be enabled automatically by the script
./install_asm --project_id ${PROJECT_ID} --cluster_name asm-a --cluster_location us-west2-a --mode install --managed --ca mesh_ca --output_dir asma --enable_registration --enable_all

./install_asm --project_id ${PROJECT_ID} --cluster_name asm-b --cluster_location us-central1-a --mode install --managed --ca mesh_ca --output_dir asmb --enable_registration --enable_all

Configure endpoint discovery between clusters

Endpoint discovery makes the clusters to communicate with each other, for example, it enables discovery of service endpoints between the clusters.

Install the required local tools as explained here: https://cloud.google.com/service-mesh/docs/downloading-istioctl, then run the following commands:

istioctl x create-remote-secret --context=asma --name=asm-a| kubectl apply -f - --context=asmb

istioctl x create-remote-secret --context=asmb --name=asm-b| kubectl apply -f - --context=asma

Testing the service mesh

Anthos Service Mesh is now ready! Let’s deploy a sample application to verify cross-cluster traffic and fail-overs.

Create the namespace for the Hello World app

Create a new namespace on both clusters and enable automatic Istio sidecar injection for both of them. Since the Istio control plane is managed by Google, the istio-injection- istio.io/rev= label is set to asm-managed.

kubectl create --context=asma namespace sample

kubectl label --context=asma namespace sample istio-injection- istio.io/rev=asm-managed --overwrite

kubectl create --context=asmb namespace sample

kubectl label --context=asmb namespace sample istio-injection- istio.io/rev=asm-managed --overwrite

Create the Hello World service

Deploy the services for the Hello World app on both clusters with:

kubectl create --context=asma -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/helloworld/helloworld.yaml -l service=helloworld -n sample

kubectl create --context=asmb -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/helloworld/helloworld.yaml -l service=helloworld -n sample

Create the Hello World deployment

Deploy the Hello World sample app, which provides an endpoint that will return the version number of the application (the version number is different in the two clusters) and an Hello World message to go with it.

kubectl create --context=asma -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/helloworld/helloworld.yaml -l version=v1 -n sample

kubectl create --context=asmb -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/helloworld/helloworld.yaml -l version=v2 -n sample

Deploy the Sleep pod

The Sleep application simulates downtime. Let’s use it to test the resilience of the service mesh! To deploy the Sleep application, use:

kubectl apply --context=asma -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/sleep/sleep.yaml -n sample

kubectl apply --context=asmb -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/sleep/sleep.yaml -n sample

Verify cross-cluster traffic

To verify that cross-cluster load balancing works as expected (read as: can the service mesh actually survive regional failures?), call the HelloWorld service several times using the Sleep pod. To ensure load balancing is working properly, call the HelloWorld service from all clusters in your deployment.

kubectl exec --context=asma -n sample -c sleep "$(kubectl get pod --context=asma -n sample -l app=sleep -o jsonpath='{.items[0].metadata.name}')" -- curl -sS helloworld.sample:5000/hello

kubectl exec --context=asmb -n sample -c sleep "$(kubectl get pod --context=asmb -n sample -l app=sleep -o jsonpath='{.items[0].metadata.name}')" -- curl -sS helloworld.sample:5000/hello

Repeat this request several times and verify that the HelloWorld version should toggle between v1 and v2. This means the request is relayed to the healthy cluster when the other one is not responding!

Summary

In this article I have explained how I deployed Anthos Service Mesh on two GKE clusters in different regions with Google-managed Istio control plane and CA certificates. Anthos Service Mesh makes it simple to deploy a multi-cluster service mesh, because most of the complexity of Istio is now managed by Google.

Credits: Header image by Luca Cavallin on nylavak.com

To calculate the elapsed times of Terraform Cloud run stages, I needed to create a pivot table in Google BigQuery. As the classic SQL query was very expensive, I changed it to use the BigQuery PIVOT operator which was 750 times cheaper.

(meer…)

After I added the Google Cloud Storage backend to PrivateBin, I was
interested in its performance characteristics. To fake a real PrivateBin client, I needed to generate an encrypted message.
I searched for a client library and found PBinCLI. As it is written in Python,
I could write a simple load test in Locust. Nice and easy. How wrong was I.

create the test

Within a few minutes I had a test script to create, retrieve and delete a paste.
To validate the functional correctness, I ran it against https://privatebin.net:

$ locust -f locustfile.py \
   --users 1 \
   --spawn-rate 1 \
   --run-time 1m \
   --headless \
   --host https://privatebin.net

The following table shows measured response times:

NameAvg (ms)Min (ms)Max (ms)Median (ms)
create-paste237140884200
get-paste28273029
delete-paste27272927

Now, I had a working load test script and a baseline to compare the performance with.

baseline on Google Cloud Storage

After deploying the service to Google Cloud Run, I ran the single user test. And it was promising.

$ locust -f locustfile.py \
  --users 1 \
  --spawn-rate 1 \
  --run-time 1m \
  --headless \
  --host https://privatebin-deadbeef-ez.a.run.app

Sure, it is not lightning fast, but I did not expect that. The response times looked
acceptable to me. After all, privatebin is not used often nor heavily.

NameAvg (ms)Min (ms)Max (ms)Median (ms)
create-paste506410664500
get-paste394335514380
delete-paste587443974550

multi-user run on Google Cloud Storage

next I wanted to know the performance of PrivateBin with my Google Cloud Storage onder moderate load. So, I scaled the load test to 5 concurrent users.

$ locust -f locustfile.py \
  --users 5 \
  --spawn-rate 1 \
  --run-time 1m \
  --headless

The results were shocking!

NameAvg (ms)Min (ms)Max (ms)Median (ms)
create-paste4130662106664500
delete-paste344976862833800
get-paste290952055692400

How? Why? Was there a bottleneck at the storage level? I checked the logs and saw steady response times reported by Cloud Run:

POST 200 1.46 KB 142 ms python-requests/2.25.1  https://privatebin-37ckwey3cq-ez.a.run.app/
POST 200 1.25 KB 382 ms python-requests/2.25.1  https://privatebin-37ckwey3cq-ez.a.run.app/
GET  200 1.46 KB 348 ms python-requests/2.25.1  https://privatebin-37ckwey3cq-ez.a.run.app/?d7e4c494ce4f613f

It took me a while to discover that locust was trashing my little M1. It was running at 100% CPU without
even blowing a fan to create the encrypted messages! So I needed something more efficient.

k6 to the rescue!

When my brain thinks fast, it thinks golang. So I downloaded k6. The user scripts
are written in JavaScript, but the engine is pure go. Unfortunately, the interpreter
is custom built and has limited compatibility with nodejs and browser JavaScript engines.
This meant that I could not use any existing JavaScript libraries to create an encrypted message.

Fortunately with xk6 you can call a golang
function from your JavaScript code. This is what I needed! So I created the k6 privatebin extension and wrote
an equivalent load test script.

build a customized k6

To use this new extension, I build k6 locally with the following commands:

go install github.com/k6io/xk6/cmd/xk6@latest
xk6 build --with github.com/binxio/xk6-privatebin@v0.1.2

k6 baseline run on Google Cloud Storage

Now I was ready to run the baseline for the single user, using k6:

$ ./k6 --log-output stderr run -u 1 -i 100  test.js

The baseline result looked pretty much like the Locust run:

NameAvg (ms)Min (ms)Max (ms)Median (ms)
create-paste440396590429
get-paste355320407353
delete-paste382322599357

k6 multi-user run on Google Cloud Storage

What would happen, I i scaled up to 5 concurrent users?

$ ./k6 --log-output stderr run -u 5 -i 100  test.js

Woohoo! The response times stayed pretty flat.

NameAvg (ms)Min (ms)Max (ms)Median (ms)
create-paste5843502612555
get-paste484316808490
delete-paste460295843436

The chart below shows the sum of the medians for the single- and multi-user load test on Locust and k6:

It is clear that Locust was skewing the results way too much. With k6, I could even simulate 20 concurrent users, and still only use 25% of my M1.

NameAvg (ms)Min (ms)Max (ms)Median (ms)
create-paste7134141204671
get-paste562352894540
delete-paste515351818495

These are way more users than i expect on my privatebin installation and these response times are very acceptable for me. Mission accomplished!

conclusion

In the future, my goto tool for load and performance tests will be k6. It is a single binary which you can run anywhere. It is fast, and the golang extensions makes it easy to include any compute intensive tasks in a very user-friendly manner.

I have recently worked on a 100%-Terraform based project where I made extensive use of Workspaces and modules to easily manage the infrastructure for different environments on Google Cloud. This blog post explains the structure I have found to work best for the purpose.

What are Terraform workspaces?

Workspaces are separate instances of state data that can be used from the same working directory. You can use workspaces to manage multiple non-overlapping groups of resources with the same configuration.
To create and switch to a new workspace, after running terraform init, run:

terraform workspace create <name>

To switch to other workspaces, run instead:

terraform workspace select <name>

Use the selected workspace in the Terraform files

The selected workspace is made available in your .tf files via the terraform.workspace variable (it’s a string). I like to assign the value to a local variable called environment (since the name of the workspaces and that of the environments match).

locals {
    environment = terraform.workspace
}

Add the environment-based configuration to a new module

Now that you have a variable containing the environment (it’s just a string) that you are operating on, you can create a module containing the environment-based configuration. I created a vars module inside the modules directory of my repository, which contains at least the following files:

  • main.tf
    This file will never changes, as it’s only needed to aggregate the variables that will be exported.

    locals {
      environments = {
        "development" : local.development,
        "acceptance" : local.acceptance,
        "production" : local.production
      }
    }
  • outputs.tf
    This file too, will never change. Here I am defining the output of the vars module so that it can be used from anywhere else in the Terraform repository.
    The exported values are based on the selected workspace.

    output "env" {
    value = local.environments[var.environment]
    }
  • variables.tf
    This file defines the variables required to initialize the module. The outputs of the module are based on the selected workspace (environment), which it needs to be aware of.

    variable "environment" {
    description = "The environment which to fetch the configuration for."
    type = string
    }
  • development.tf & acceptance.tf & production.tf
    These files contain the actual values that differ by environment. For example, when setting up a GKE cluster, you might want to use cheap machines for your development node pool, and more performant ones in production. This can be done by defining a node_pool_machine_type value in each environment, like so:

    // in development.tf
    locals {
    development = {
        node_pool_machine_type = "n2-standard-2"
    }
    }
    // in acceptance.tf
    locals {
    acceptance = {
        node_pool_machine_type = "n2-standard-4"
    }
    }
    // in production.tf
    locals {
    production = {
        node_pool_machine_type = "n2-standard-8"
    }
    }

The vars module is now ready to be used from anywhere in the repository, for example in main.tf file. To access the configuration values, initialize the module like so:

#
# Fetch variables based on Environment
#
module "vars" {
    source      = "./modules/vars"
    environment = local.environment
}

The correct configuration will be returned based on the Terraform Workspace (
environment name) being passed to it, and values can be accessed via module.vars.env.<variable-name>. For example:

node_pools = [
    {
        ...
        machine_type = module.vars.env.node_pool_machine_type
        ...
    }
]

Summary

In this blog post I have shown you how you can use Terraform Workspaces to switch between different configurations based on the environment you are working on, while keeping the setup as clean and simple as possible. Are you interested in more articles about Terraform? Checkout How to Deploy ElasticSearch on GKE using Terraform and Helm!

Credits: Header image by Luca Cavallin on nylavak.com

We have read and selected a few articles and other resources about the latest developments around cloud technology so you don’t have to. Read further and keep yourself up-to-date in five minutes!

In a data-driven, global, always-on world, databases are the engines that let businesses innovate and transform. As databases get more sophisticated and more organizations look for managed database services to handle infrastructure needs, there are a few key trends we’re seeing.

Read more:
https://cloud.google.com/blog/products/databases/6-database-trends-to-watch

How does Anthos simplify hybrid & multicloud deployments?

Most enterprises have applications in disparate locations—in their own data centers, in multiple public clouds, and at the edge. These apps run on different proprietary technology stacks, which reduces developer velocity, wastes computing resources, and hinders scalability. How can you consistently secure and operate existing apps, while developing and deploying new apps across hybrid and multicloud environments? How can you get centralized visibility and management of the resources? Well, that is why Anthos exists!

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/how-does-anthos-simplify-hybrid-multicloud-deployments

Automate your budgeting with the Billing Budgets API

Budgets are ideal for visibility into your costs but they can become tedious to manually update. Using the Billing Budgets API you can automate updates and changes with your custom business logic.

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/automate-your-budgeting-billing-budgets-api

Amazon RDS for PostgreSQL Integrates with AWS Lambda

You can now invoke Lambda functions from stored procedures and user-defined functions inside your PostgreSQL database using the aws_lambda PostgreSQL extension provided with RDS for PostgreSQL. You can choose to execute the lambda synchronously or asynchronously. Also you get access to the execution log, the error and the returned result for further processing. This probably also means you can call Lambda function from triggers on tables to respond to data changes!

Read more:
https://aws.amazon.com/about-aws/whats-new/2021/04/amazon-rds-postgresql-integrates-aws-lambda/

AWS SAM CLI now supports AWS CDK applications (public preview)

Some features of SAM now become available for CDK developers. Local testing before synthesizing of Lambda functions and API endpoints. And building your Lambda resources into deployable zip files. All this while still being able to use the CDK CLI for creating, modifying, and deploying CDK applications.

Read more:
https://aws.amazon.com/about-aws/whats-new/2021/04/aws-sam-cli-supports-aws-sdk-applications-public-preview/

AWS CloudFront Functions

It is now possible to manipulate requests (viewer request and viewer response) with new CloudFront Functions. These are Javascript functions that execute within 1 millisecond on the outer edge locations (218 in stead of 13!). You can’t do really crazy stuff, most suitable for header manipulations, redirects, url rewrites and token validations. The cost of using functions is 1/6 of the Lambda@Edge price!

Read more:
https://aws.amazon.com/blogs/aws/introducing-cloudfront-functions-run-your-code-at-the-edge-with-low-latency-at-any-scale/

Amazon EC2 Auto Scaling introduces Warm Pools

You can now keep some pre-initialized stopped instances handy to quickly start them when autoscaling needs them. This is much cheaper then keeping live instances ready. Starting these pre-initialized is then a matter of 10s of seconds in stead of minutes.

Read more:
https://aws.amazon.com/about-aws/whats-new/2021/04/amazon-ec2-auto-scaling-introduces-warm-pools-accelerate-scale-out-while-saving-money/

Credits: Header image by Luca Cavallin on nylavak.com

Did you even wonder how to connect to CloudSQL with IAM authentication? Since this year, CloudSQL for PostgreSQL allows IAM users and IAM service accounts to login as a database user. In this short blog I will show you how to do this using Terraform.

(meer…)

In this blog I introduce the Renovate bot, a great tool to automate the update application dependencies in your source code.

(meer…)

I was recently tasked with deploying ElasticSearch on GKE using Terraform and Helm, and doing so in most readable way possible. I wasn’t very familiar with Helm before, so I did some research to find approach that would fulfill the requirements. In this post I will share with you the Terraform configuration I used to achieve a successful deployment.

What is Helm?

Helm is, at its most basic, a templating engine to help you define, install, and upgrade applications running on Kubernetes. Using Helm, you can leverage its Charts feature, which are simply Kubernetes YAML configuration files (that can be further configured and extended) combined into a single package that can be used to deploy applications on a Kubernetes cluster. To be able to use Helm via Terraform, we need to define the corresponding provider and pass the credentials needed to connect to the GKE cluster.

provider "helm" {
  kubernetes {
    token                  = data.google_client_config.client.access_token
    host                   = data.google_container_cluster.gke.endpoint
    cluster_ca_certificate = base64decode(data.google_container_cluster.gke.master_auth[0].cluster_ca_certificate)
  }
}

Terraform configuration

I am defining an helm_release resource with Terraform, which will deploy the ElasticSearch cluster when applied. Since I am using an Helm chart for the cluster, doing so is incredibly easy. All I had to do was tell Helm the name of the chart to use and where it is located (repository), along with the version of ElasticSearch that I would like to use.
With the set blocks instead, I can override the default values from the template: this makes it easy to select an appropriate storage class, amount of storage and in general any other piece of configuration that can be changed (you have to refer to the documentation of the chart itself to see which values can be overridden), directly from Terraform.

resource "helm_release" "elasticsearch" {
  name       = "elasticsearch"
  repository = "https://helm.elastic.co"
  chart      = "elasticsearch"
  version    = "6.8.14"
  timeout    = 900

  set {
    name  = "volumeClaimTemplate.storageClassName"
    value = "elasticsearch-ssd"
  }

  set {
    name  = "volumeClaimTemplate.resources.requests.storage"
    value = "5Gi"
  }

  set {
    name  = "imageTag"
    value = "6.8.14"
  }
}

Creating a new storage class

I then had to provision a new storage class, which will be used by the ElasticSearch cluster to store data. The configuration below sets up the SSD (SSD is recommended for such purpose, since it’s faster than a regular HDD) persistent disk that I referenced in the main configuration above.

resource "kubernetes_storage_class" "elasticsearch_ssd" {
  metadata {
    name = "elasticsearch-ssd"
  }
  storage_provisioner = "kubernetes.io/gce-pd"
  reclaim_policy      = "Retain"
  parameters = {
    type = "pd-ssd"
  }
  allow_volume_expansion = true
}

Summary

In this blog post I have shown you how to deploy ElasticSearch on GKE using Terraform and Helm. The required configuration is simple and very readable, lowering the barrier to handling all of your infrastructure via Terraform, rather than, for example, using Cloud Marketplace, managed services, or other custom solutions.

Credits: Header image by Luca Cavallin on nylavak.com

When I test new ideas and features, I often rack up accidental cloud cost on Google Cloud Platform. I forget to delete things I created, and end up paying for resources that I kept running as a result.

(meer…)

When you want to create a Python command line utility for Google Cloud Platform, it would be awesome if you could use the active gcloud credentials in Python. Unfortunately, the Google Cloud Client libraries do not support using the gcloud credentials. In this blog, I will present a small Python library which you can use to do just that.

How use the gcloud credentials in Python

It is really simple. You install the package gcloud-config-helper and call the default function, as shown below:

import gcloud_config_helper
credentials, project = gcloud_config_helper.default()

You pass the credentials to a service client as follows:

c = compute_v1.InstancesClient(credentials=credentials)
for zone, instances in c.aggregated_list(request={"project": project}):
    for instance in instances.instances:
        print(f'found {instance.name} in zone {zone}')

That is all there is to it! Check out the complete example of using the gcloud configured credentials.

How does it work?

The library executes the command gcloud config config-helper. This commands provides authentication and configuration data to external tools. It returns an access token, an id token, the name of the active configuration and all associated configuration properties as show below:

    configuration:
      active_configuration: playground
      properties:
        core:
          account: markvanholsteijn@binx.io
          project: playground
        ...
    credential:
      access_token: ya12.YHYeGSG8flksArMeVRXsQB4HFQ8aodXiGdBgfEdznaVuAymcBGHS6pZSp7RqBMjSzHgET08BmH3TntQDOteVPIQWZNJmiXZDr1i99ELRqDxDAP8Jk1RFu1xew7XKeQTOTnm22AGDh28pUEHXVaXtRN8GZ4xHbOoxrTt7yBG3R7ff9ajGVYHYeGSG8flksArMeVRXsQB4HFQ8aodXiGdBgfEdznaVuAymcBGHS6pZSp7RqBMjSzHgET08BmH3TntQDOteVPIQWZNJmiXZDr1i99ELRqDxDAP8Jk1RFu1xew7XKeQTOTnm22AGDh28pUEHXVaXtRN8GZ4xHbOoxrTt7yBG3R7ff9ajGV
      id_token: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJodHRwczovL2FjY291bnRzLmdvb2dsZS5jb20iLCJhenAiOiI5OTk5OTk5OTk5OS5hcHBzLmdvb2dsZXVzZXJjb250ZW50LmNvbSIsImF1ZCI6Ijk5OTk5OTk5OTk5LmFwcHMuZ29vZ2xldXNlcmNvbnRlbnQuY29tIiwic3ViIjoiMTExMTExMTExMTEyMjIyMjIyMjIyMjIiLCJoZCI6ImJpbnguaW8iLCJlbWFpbCI6Im1hcmt2YW5ob2xzdGVpam5AYmlueC5pbyIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJhdF9oYXNoIjoic2Rmc2prZGZqc2tkZnNrZGZqcyIsImlhdCI6MTU4ODcwMTY4MywiZXhwIjoxNTg4NzA1MjgzfQ.-iRKFf69ImE93bvUGBxn3Fa5aPBjhyzeWfLzuaNdIGI
      token_expiry: '2020-05-05T19:01:22Z'

When the token expires, the library will call the helper again to refresh it. Note that Google is unsure whether the config-helper is a good thing. If you read gcloud config config-helper --help, you will notice the following sentence:

This command is an internal implementation detail and may change or disappear without notice.

For the development of command line utilities which integrate into the Google Cloud SDK ecosystem, it would be really handy if Google would provide an official way to obtain the active gcloud configuration and credentials.

conclusion

With the help of this library, it is possible to create a command line utility in Python for the Google Cloud Platform using the gcloud credentials in Python. It is unfortunate that Google marks the config-helper as a volatile interface. Given the simplicity of the interface, I trust this library will be able to deal with any future changes. It would be even better, if Google would provide official support.

We also created a library for authenticating with gcloud credentials in Go.

Image by Kerstin Riemer from Pixabay

What is the next big thing? That’s Cloud. But what is the value of cloud training? And why should you train with an authorized training partner?

In this video and blog post, Max Driessen, chief of cloud training, and Martijn van de Grift, cloud consultant and authorized instructor for AWS and GCP, discuss the importance of cloud training and reasons to train with an ATP.

How important is cloud training?

“That depends on the phase your organization is in, and on the people within your organization”, Max starts.

“Generally speaking, proper cloud training is a great enabler to speed up development and go-to-market, and to implement best-practices such as cost optimization, performance efficiency, and security".

“Organizations usually start their cloud journey by implementing virtual machines or migrating small databases. That isn’t really different from what their engineers have been doing all along”, Martijn adds.

“What is essential in this phase, is that the management defines a vision on cloud and a roadmap. Managers need to become familiar with operating models in the cloud, different cost models and cost allocation".

“As organizations advance towards using the cloud-native building blocks, new challenges arise. Organizations in this phase can really benefit from fundamental training”, says Martijn.

“If a company is already really progressing well with the cloud, they move past the need for the fundamentals towards a potential to dive deep with specific technical cloud training to speed up their development".

What kind of training needs do you see?

Without a blink, Max sums up: “I see three different general training needs: Cloud Awareness, Cloud Transformation, Cloud Skills".

“Cloud Awareness training is for companies that are setting their first steps with the cloud. This training is ideal for managers, project leads, product owners, basically anyone within an organization who needs to get a better grip on the cloud. To develop an understanding of the capabilities, the capacity, the return on investment of cloud even," Max explains.

“Cloud Transformation is for companies that are already in a hybrid or a migration process, who need support and skills within that process of transformation". Max continues. “They will learn about cloud target operating models, defining a business case around cloud, but also, they will learn about the organizational impact of a cloud transformation".

Martijn says: ”Cloud Skills are off-the-shelf training from the curriculum of AWS, GCP, or Azure. These learning paths focus on roles like architects, developers, security or data specialists, and even help you to get prepared to take a certification exam. Each program in the curriculum of AWS, GCP, or Azure, is great preparation for an exam".

“If you want to take the exam, that is something you have to decide on your own, but the training will definitely help you to develop the right competencies", Max adds.

Martijn concludes: “Obviously, you also need to gain practical experience. That is why it’s so useful to have an instructor who has a couple of years of practical experience under his belt".

Why train with Binx?

“Three reasons stand out: First of all, we make sure that each individual receives the best training experience possible by meeting you where you are. Secondly, we combine our training activities with high-end consultancy, so we have first-hand experience with cloud projects. Thirdly, but certainly not the least: we are an authorized training partner for AWS, and GCP," Max says with a smile.

But how do you define the training need and the right training? “If you identify a training need, but you are not really sure on which topic you need training, we have an assessment for you. We can help to identify the knowledge gaps, on which domains. We can help you to fill in the blanks. An assessment could also be a questionnaire, where we have a conversation with a company, to check where they are and to assess where we can be of any help," Max explains.

Martijn has been working as a cloud engineer for some years now. Does that provide added value when delivering training?

“Being not only a consultancy firm but also a training partner, is definitely a big plus. We are working with clients four or five days a week, helping the customers to solve their problems. And on top of this, we are also delivering training".

"We know first-hand what the challenges are that customers are typically facing”, explains Martijn. “And we are also up to date in all the courses and materials. During the training courses, you work together with the instructor. There is plenty of time to ask the instructor. The instructor will tap into his experience and knowledge to explain all the topics. This is definitely a big benefit over an online course".

“Being an authorized training partner with AWS and GCP is very special. Especially if you are authorized in for multiple cloud service providers. That is something that rarely happens. The bar is set high. For example, scoring a 9 out of 10 in feedback for a training", Max says.

Martijn adds: “To become an authorized trainer, you also have to meet a high standard. You have to know everything about the cloud providers and their services. About Amazon, Google, and Azure. Instructors are tested every year by the cloud service provider itself".

Can you share some of your personal training experiences?

Max starts to smile as he recalls his first cloud training a while ago: “In the meantime, I joined three courses, and it has been quite a journey. It was quite challenging at the start, but having a good trainer really made the difference. In my case, I was lucky to have Martijn as the trainer."

Martijn says “My favorite training is the Cloud Architect Professional training. This training really gives a global overview of everything a cloud provider has to offer you. It’s actually more than the cloud from a developer point-of-view, or only from a security point-of-view. It really gives you an overview of the services and how you can use them properly."

Interested to learn more about the benefits of training for you or your organization? Get in touch with Max Driessen to discuss the possibilities.

We have read and selected a few articles and other resources about the latest developments around cloud technology so you don’t have to. Read further and keep yourself up-to-date in five minutes!

Inventory management with BigQuery and Cloud Run

Cloud Run is often used just as a way of hosting websites, but there’s so much more you can do with it. In this blog, Aja Hammerly (Developer Advocate) will show us how to use Cloud Run and BigQuery together to create an inventory management system. The example uses a subset of the Iowa Liquor Control Board data set to create a smaller inventory file for a fictional store. 

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/inventory-management-bigquery-and-cloud-run

3 common serverless patterns to build with Workflows

The Workflows orchestration and automation service is a serverless product designed to orchestrate work across Google Cloud APIs as well as any HTTP-based API available on the internet, and it has recently reached General Availability. At the same time, Workflows has been updated with a preview of Connectors, which provide seamless integration with other Google Cloud products to design common architecture patterns that can help you build advanced serverless applications. 

Read more:
https://cloud.google.com/blog/products/application-development/building-serverless-apps-with-workflows-and-connectors

Google Cloud offers lots of products to support a wide variety of use cases. To make it easier to orient yourself, Google has created a set of resources that makes it easy to familiarize yourself with the Google Cloud ecosystem. You can use these resources to quickly get up to speed on different products and choose those that you’re most interested in for a deeper dive into documentation and other available resources. 

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/back-popular-demand-google-cloud-products-4-words-or-less-2021-edition

Node.js 14.x runtime now available in AWS Lambdar

While it is still OK to use node.js 12.x until the end of 2021, version 14.x brings a few things that make the life of a developer a lot nicer. Most of the changes come from the newly upgraded Javascript engine: v8.1 on which nodejs runs! New features include nullish coalescing operator, nullish assignment and optional chaining.

Read more:
https://aws.amazon.com/blogs/compute/node-js-14-x-runtime-now-available-in-aws-lambda/

Cost savings on AWS

AWS recently introduced the “Compute Savings Plan” for Lambda. It allows for 12% to 17% discount on the expenses. So if you spend say $730 per month ($1 per hour) on lambda you can commit for $640 per month ($0.88 per hour). Leading to yearly savings of $1080. ($1 per hour is a LOT of Lambda usage though so probably your actual situation will not benefit this much).

Also it is now possible to save costs on CloudFront outbound traffic. In as use case where the traffic amounts to $50 per day the savings are around $800 per month!

Read more:
https://aws.amazon.com/blogs/aws/savings-plan-update-save-up-to-17-on-your-lambda-workloads/ and https://aws.amazon.com/about-aws/whats-new/2021/02/introducing-amazon-cloudfront-security-savings-bundle/

AWS DynamoDB audit logging and monitoring with CloudTrail

DynamoDB change logging was usually done through streams. Now it is possible to do logging with CloudTrail data events. This also allows for access logging including the queries used to access the data. These Cloudwatch data events also support access logging for S3, Lambda (invokes) and Managed Blockchain.

Read more:
https://aws.amazon.com/blogs/database/amazon-dynamodb-now-supports-audit-logging-and-monitoring-using-aws-cloudtrail/

S3 Object Lambda

A new type of Lambda can be attached to an S3 bucket that allows you to change the response for get requests. The Lambda is attached to an S3 Control Access Point which will lead the requests to the Lambda. Inside the Lambda you you can change header, body and status code based on your requirements. Use cases could be: resizing images, partially hiding some data based on requester identity or adding headers from another datasource.

https://aws.amazon.com/blogs/aws/introducing-amazon-s3-object-lambda-use-your-code-to-process-data-as-it-is-being-retrieved-from-s3/

Credits: Header image by Luca Cavallin on nylavak.com

When building custom images for Google Cloud Platform using Hashicorp Packer, you either specify a fixed, or latest for the Google source image version. When specifying a fixed version, you run the risk of building with a deprecated image. If you use latest, your build may introduce changes unknowingly. As we adhere to the principle to keep everything under version control, we created a utility which ensures that your packer template is always referring to the latest, explicit version of an image.

(meer…)

Since version 0.13 terraform supports the notion of a provider registry. You can build and publish your own provider in the public registry. At the moment, there is no private registry implementation available. In this blog, I will show you how to create a private terraform provider registry on Google Cloud Storage. A utility will generate all the required documents to create a static registry.

(meer…)

At a current customer, we’re using GCP’s IAP tunnel feature to connect to the VMs.
When deploying fluentbit packages on existing servers for this customer, I decided it would save some time if I would make an Ansible playbook for this job.
As I only have access to them through IAP, I ran into a problem; Ansible does not have an option to use gcloud compute ssh as a connection type.

It turns out, you do have the option to override the actual ssh executable used by Ansible. With some help from this post, and some custom changes, I was able to run my playbook on the GCP servers.

Below you will find the configurations used.

ansible.cfg

contents:

[inventory]
enable_plugins = gcp_compute

[defaults]
inventory = misc/inventory.gcp.yml
interpreter_python = /usr/bin/python

[ssh_connection]
# Enabling pipelining reduces the number of SSH operations required
# to execute a module on the remote server.
# This can result in a significant performance improvement 
# when enabled.
pipelining = True
scp_if_ssh = False
ssh_executable = misc/gcp-ssh-wrapper.sh
ssh_args = None
# Tell ansible to use SCP for file transfers when connection is set to SSH
scp_if_ssh = True
scp_executable = misc/gcp-scp-wrapper.sh

First we tell Ansible that we want to use the gcp_compute plugin for our inventory.
Then we will point Ansible to our inventory configuration file from which the contents can be found below.

The ssh_connection configuration allows us to use gcloud compute ssh/scp commands for our remote connections.

misc/inventory.gcp.yml

contents:

plugin: gcp_compute
projects:
  - my-project
auth_kind: application
keyed_groups:
  - key: labels
    prefix: label
  - key: zone
    prefix: zone
  - key: (tags.items|list)
    prefix: tag
groups:
  gke          : "'gke' in name"
compose:
  # set the ansible_host variable to connect with the private IP address without changing the hostname
  ansible_host: name

This will enable automatic inventory of all the compute instances running in the my-project GCP project.
Groups will be automatically generated based on the given keyed_groups configuration and in addition I’ve added a gke group based on the VM’s name.
Setting the Ansible_host to the name will make sure our gcloud ssh command will work. Otherwise Ansible will pass you the instance IP address.

misc/gcp-ssh-wrapper.sh

contents:

#!/bin/bash
# This is a wrapper script allowing to use GCP's IAP SSH option to connect
# to our servers.

# Ansible passes a large number of SSH parameters along with the hostname as the
# second to last argument and the command as the last. We will pop the last two
# arguments off of the list and then pass all of the other SSH flags through
# without modification:
host="${@: -2: 1}"
cmd="${@: -1: 1}"

# Unfortunately ansible has hardcoded ssh options, so we need to filter these out
# It's an ugly hack, but for now we'll only accept the options starting with '--'
declare -a opts
for ssh_arg in "${@: 1: $# -3}" ; do
        if [[ "${ssh_arg}" == --* ]] ; then
                opts+="${ssh_arg} "
        fi
done

exec gcloud compute ssh $opts "${host}" -- -C "${cmd}"

Ansible will call this script for all remote commands when the connection is set to ssh

misc/gcp-scp-wrapper.sh

contents:

#!/bin/bash
# This is a wrapper script allowing to use GCP's IAP option to connect
# to our servers.

# Ansible passes a large number of SSH parameters along with the hostname as the
# second to last argument and the command as the last. We will pop the last two
# arguments off of the list and then pass all of the other SSH flags through
# without modification:
host="${@: -2: 1}"
cmd="${@: -1: 1}"

# Unfortunately ansible has hardcoded scp options, so we need to filter these out
# It's an ugly hack, but for now we'll only accept the options starting with '--'
declare -a opts
for scp_arg in "${@: 1: $# -3}" ; do
        if [[ "${scp_arg}" == --* ]] ; then
                opts+="${scp_arg} "
        fi
done

# Remove [] around our host, as gcloud scp doesn't understand this syntax
cmd=`echo "${cmd}" | tr -d []`

exec gcloud compute scp $opts "${host}" "${cmd}"

This script will be called by Ansible for all the copy tasks when the connection is set to ssh

group_vars/all.yml

contents:

---
ansible_ssh_args: --tunnel-through-iap --zone={{ zone }} --no-user-output-enabled --quiet
ansible_scp_extra_args: --tunnel-through-iap --zone={{ zone }} --quiet

Passing the ssh and scp args through the group-vars makes it possible for us to set the zone to the VM’s zone already known through Ansible’s inventory.
Without specifying the zone, gcloud will throw the following error:

ERROR: (gcloud.compute.ssh) Underspecified resource [streaming-portal]. Specify the [--zone] flag.

Hopefully this post will help anyone that ran into the same issue!

When you create a Google service account key file for an external system, the private key has to
be transported. In addition, the generated key is valid until January 1st in
the year 10000. In this blog I will show you, how an external system can identity itself as the
service account without exposing the private key.

(meer…)

I recently had to optimize the performance of a PHP-based API on Cloud Run. After a performance test, we discovered that the API became very slow when we put some serious load on it (with response times exceeding 10 seconds). In this post you’ll learn what changes I made to get that down to a stable 100ms.

The API uses PHP 7.4, Laravel 8.0 and MySQL on Cloud SQL (the managed database on Google Cloud). The API needs to handle at least 10,000 concurrent users. The container image we deploy to Cloud Run has nginx and PHP-FPM.

This is what I did to improve the response times. (Don’t worry if not everything makes sense here, I’ll explain everything).

  • Matching the number of PHP-FPM workers to the maximum concurrency setting on Cloud Run.
  • Configuring OPcache (it compiles and caches PHP scripts)
  • Improving composer auto-loading settings
  • Laravel-specific optimizations including caching routes, views, events and using API resources

Matching Concurrency and Workers

The application uses nginx and PHP-FPM, which is a process manager for PHP. PHP is single threaded, which means that one process can handle one (exactly one) request at the same time. PHP-FPM keeps a pool of PHP workers (a worker is a process) ready to serve requests and adds more if the demand increases.
It’s a good practice to limit the maximum size of the PHP-FPM worker pool, to make sure your resource usage (CPU and memory) is predictable.

To configure PHP-FPM for maximum performance, I first set the process manager type for PHP-FPM to static, so that the specified number of workers are running at all times and waiting to handle requests. I did this by copying a custom configuration file to the application’s container and configuring the environment so that these options will be picked up by PHP-FPM (you must copy the configuration where it is expected, in my case, into /usr/local/etc/php-fpm.d/). The settings I needed are:

pm = static
pm.max_children = 10

However, if you set a limit, and more requests come to a server than the pool can handle, requests start to queue, which increases the response time of those requests:

Nginx and php-fpm request model

Limiting Concurrent Requests on Cloud Run

To avoid request queuing in nginx, you’ll need to limit the number of requests Cloud Run sends to your container at the same time.

Cloud Run uses request-based autoscaling. This means it limits the amount of concurrent requests it sends to a container, and adds more containers if all containers are at their limit. You can change that limit with the concurrency setting. I set it to 10, which I determined is the maximum number of concurrent requests a container with 1GB of memory and 1vCPU can take for with this application.

Cloud Run concurrent requests with Nginx and php-fpm

You really want to make sure Cloud Run’s concurrency setting matches the maximum number of PHP-FPM workers! For example, if Cloud Run sends 100 concurrent requests to a container before adding more containers, and you configured your PHP-FPM to start only 10 workers, you will see a lot of requests queuing.

If these tweaks aren’t enough to reach the desired performance, check the Cloud Run metrics to see what the actual utilization percentages are. You might have to change the amount of memory and vCPUs available to the container. The downside of this optimization is that more containers will be running due to lower concurrency, resulting in higher costs. I also noticed temporary delays when new instances are starting up, but this normalizes over time.

Configuring OPCache

OPCache is a default PHP extension that caches the compiled scripts in memory, improving response times dramatically. I enabled and tweaked OPCache settings by adding the extension’s options to a custom php.ini file (in my case, I put it in the /usr/local/etc/php/conf.d/ directory). The following is a generic configuration that you can easily reuse, and you can refer to the documentation for the details about every option.

opcache.enable=1
opcache.memory_consumption=128
opcache.interned_strings_buffer=64
opcache.max_accelerated_files=32531
opcache.validate_timestamps=0
opcache.save_comments=1
opcache.fast_shutdown=0

Optimizing Composer

Composer is a dependency manager for PHP. It lets you specify the libraries your app needs, and downloads them for you to a directory. It also generates an autoload configuration file, which maps import paths to files.
If you pass the --optimize-autoloader flag to composer, it will generate this file only once, and doesn’t dynamically update it if you add new code. While that is convenient in development (your changes show up immediately), in production it can make your code really slow.

You can optimize Composer’s autoloader passing the --optimize-autoloader flag like this:

composer install --optimize-autoloader --no-dev

Laravel-specific optimizations

The application I optimized is built with Laravel, which provides a number of tools that can help improving the performance of the API. Here’s what I did on top of the other tweaks to get the response times below 100ms.

  • I have leveraged Laravel’s built-in caching features during builds to reduce start-up times. There are no downsides to these tweaks, except that you won’t be able to use closure-defined routes (they can’t be cached). You can cache views, events and routes with these commands:

    php artisan view:cache
    php artisan event:cache
    php artisan route:cache

    Avoid running php artisan config:cache since Laravel ignores environment variables if you cache the configuration.

  • Using Laravel API Resources further improves the response times of your application. This has proven to be much faster than having the framework automatically convert single objects and collections to JSON.

Summary

In this blog, I shared with you what I learned from optimizing the performance of a PHP-based API on Cloud Run. All of the tweaks together helped cut response times to one tenth of the original result, and I think the most impact was made by matching concurrency and PHP-FPM workers (if you’re in a hurry, do only this). Watching the application metrics has been fundamental throughout the performance testing phase, just as inspecting Cloud Run logs after each change.

If your application still shows poor performance after these changes, there are other tweaks you can make to improve response times, which I haven’t discussed here.

  • Increase PHP memory limits if needed
  • Check MySQL for slow queries (often due to missing indexes)
  • Cache the responses with a CDN
  • Migrate to PHP 8.0 (up to 3x faster)

Would you like to learn more about Google Cloud Run? Check out this book by our own Wietse Venema.

Credits: Header image by Serey Kim on Unsplash

We have read and selected a few articles and other resources about the latest developments around cloud technology so you don’t have to. Read further and keep yourself up-to-date in five minutes!

Introducing GKE Autopilot: a revolution in managed Kubernetes

GKE Autopilot manages the infrastructure, letting you focus on your software. Autopilot mode automatically applies industry best practices and can eliminate all node management operations, maximizing your cluster efficiency (you will not be charged for idle nodes), helping to provide a stronger security posture and ultimately simplifying Kubernetes.

Read more:
https://cloud.google.com/blog/products/containers-kubernetes/introducing-gke-autopilot

What is Amazon Detective?

Amazon Detective makes it easy to analyze, investigate, and identify the root cause of potential security issues or suspicious activities. It automatically collects log data from your AWS resources and uses machine learning to build a linked set of data that enables you to easily conduct faster and more efficient security investigations.

Watch video:
https://www.youtube.com/watch?v=84BhFGIxIqg

Introduction to Amazon Timestream –Time Series Database

Amazon Timestream is a serverless time series database service for IoT and operational applications that makes it easy to store and analyze trillions of events per day. Amazon Timestream saves you time and cost in by keeping recent data in memory and moving historical data to a cost optimized storage tier.

Watch video:
https://www.youtube.com/watch?v=IsmhOkimHyI

What are my hybrid and multicloud deployment options with Anthos?

Anthos is a managed application platform that extends Google Cloud services and engineering practices to your environments so you can modernize apps faster. With Anthos, you can build enterprise-grade containerized applications with managed Kubernetes on Google Cloud, on-premises, and other cloud providers.

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/what-are-my-hybrid-and-multicloud-deployment-options-anthos

Accelerate data science workflows with Looker

Looker is a modern business intelligence (BI) and analytics platform that is now a part of Google Cloud. It’s a full-fledged data application and visualization platform and it allows users to curate and publish data, integrating with a wide range of endpoints in many different formats ranging from CSV, JSON, Excel, files to SaaS and in-house built custom applications.

Read more:
https://cloud.google.com/blog/topics/training-certifications/how-to-use-looker-on-google-cloud-for-data-governance

While doing Cloud Native migrations, we depend on container images a lot. In this mini-blog I will present you with the
simplest Google Cloud Build config file that will build both snapshot and release versions at the same time.

(meer…)

When you work with AWS ECS and you apply the least-privileged security principle, you create an IAM role for each task. While developing the software, you may want to test the role too. However, this is not easily done. In this blog we will show you how to can be done in the official way, and in a slightly more alternative way with the iam-sudo utility for development purposes.

(meer…)

In the Xebia First Candidates restaurant, Xebia business unit managers share what excites them about working for their respective units.

Love is like a cloud migration: Thrilling at the start, but once you get going, you never want to go back.

Smooth Operations

Besides being an avid gamer, Elaine Versloot is also the chief of Operations at cloud boutique Binx.io. Here, she focuses on turning the dynamics of cloud migration and modernization into a smooth operation.

We are looking for people with a passion for cloud. Developing cloud-native applications should be in the blood. We find it just as important that individuals are fun to be around, the social aspect is also very important.

When someone found his or her expertise and can talk about this with passion. I love that! Nothing gets me going more than that excitement in someone’s eye.

Go On a Blind Date With Elaine

Interested to learn more about the way Binx approaches cloud?

Go on a blind business date with Elaine

I recently passed the Google Cloud Associate Cloud Engineer certification exam. This post describes what the exam is about, how I prepared (with links to useful resources), and the strategies I used to answer the questions in under two minutes per question (skip down to Answering Strategies if that’s just what you want to know).

If you’re reading this before February 9, you can still join my webinar! I’ll help you put together an actionable roadmap to pass the certification exam and share more insights.

What’s an Associate Cloud Engineer?

First, some background if you don’t know what this is about, and are trying to figure out if you should take the certification. The Associate Cloud Engineer certification is one of the most technical Google Cloud certifications: I think it is even harder than the Professional Cloud Architect exam I’m currently preparing for. Nevertheless, this is the first stepping stone into Google Cloud, making it an obvious choice to get started if you are a software engineer, like I am.

Google Cloud defines no prerequisites for this exam, but recommends a minimum of six months hands-on experience on the platform. The topics covered in the certification are:

  • Projects, Billing, IAM
  • Compute Engine, App Engine and Kubernetes Engine
  • VPC and Networking
  • Cloud Storage
  • Databases (included are for example Cloud SQL, Memorystore, and Firestore)
  • Operations Suite (Cloud Logging and other services). You might know this as Stackdriver.

The next step in the learning journey after the Associate Cloud Engineer exam is the Professional Cloud Architect certification. From there, you can choose to specialize in Data and Machine Learning Engineering, Cloud Development and DevOps, or Security and Networking. (Check out the certification paths here).

Why Did I Choose Google Cloud?

Google Cloud strikes me as the most developer-friendly cloud provider. At the same time, it is also the smallest of the three main players but it is growing fast, meaning lots of room for new experts. I like the certification program because it is role-based (instead of product-centered), preparing not just for this provider, but the position itself.

About Me

I want to tell you a little bit about me, so you know how to interpret my advice. I have been a Software Engineer for the past 8+ years, doing mostly full-stack development. The ability to work on what “powers” the internet and an interest for what’s under the hood of software systems brought me to the cloud. Furthermore, the managed services provided by cloud providers make it easy to add all sorts of features into applications, from sending emails to advanced machine learning – these days nothing is stopping you from starting your next company from the comfort of your bedroom.

I grew up in Italy and moved to the Netherlands a few years ago, where I now live with my wife and cat. When I am not working on clouds and other atmospheric phenomena, I enjoy photography and cycling.

Feel free to connect with me on LinkedIn or send me an email at lucacavallin@binx.io!

What Does the Exam Look Like?

This is how a certification exam works: you have two hours to answer 50 multiple choice and multiple selection questions. That means you’ll have roughly two minutes per question, with 20 minutes left at the end to review your answers.

The exam can be taken remotely or in person, costs 125 USD (roughly 100 EUR), and can be retaken after a cool down period if you fail. The certification is valid for two years (you’ll have to take the full exam again to recertify.

Exam Guide

Google publishes an exam guide for every certification. It details the topics of the certification, providing details of what you need to know for each of the areas.

I used the exam guide to track my progress while studying. You can make a copy of it and use color coding to highlight the topic you feel confident about and those that need more of your attention.

Learning resources

There are many books and online courses you can take to prepare for the exam, some more effective than others. For most people, a combination of written and audio-visual material works best. These are the resources that helped me best (and why):

  • Official Google Cloud Certified Associate Cloud Engineer Study Guide (book)
    • Most comprehensive resource
    • Questions after each chapter
    • Includes practice tests and flashcards
  • ACloudGuru (online course)
    • This is a good introduction (start with this course if you have no Google Cloud experience)
    • You also need the “Kubernetes Deep Dive” course
    • Hands-on labs
    • This course does not cover everything you need to know

Practice Tests

Practice tests are a key part of your preparation because they let you test your knowledge in a setting similar to the actual exam. If you finish the test, they provide you with a detailed explanation for each question, documenting the correct and wrong answers.

Take note of the topics that require attention and review the documentation accordingly. Once you consistently score at least 90% on practice tests, you are ready for the exam.

The Official Study Guide book provides review questions at the end of each chapter and an online portal with two practice tests. The ACloudGuru course also has a practice exam you can take, and you can find similar resources on Udemy.

Answering Strategies

If you do the math, you’ll see that you only have two minutes to answer every question, which is not much, given that each of them is quite lengthy. Here are the strategies I used when answering the questions.

  • Identify the core question
  • Review it carefully since a single word can make the difference
  • Eliminate answers that are wrong or in conflict with the question
  • Choose the cheapest and most secure option
  • Read the question again, keeping in mind the answer you chose

You can also mark answers during the test to come back to them later. While you don’t have much time left for review at the end, this practice will save you from over thinking and losing valuable time.

Taking the Remote Proctored Exam

If you decide to take the exam remotely, once you have arranged it, you will have to install a Sentinel tool provided by the testing authority, verify your identity and pass a number of checks.

TIP: The operator taking you through the process doesn’t talk, so you need to scroll the chat for their questions. For me, the chat didn’t auto-scroll, so it was a bit awkward at first.

Join My Webinar

If you’re reading this before February 9, you can still register for my webinar! I’ll share all my learnings. I would love to meet you and answer any questions! Make sure to register here.

Summary

In this post, I shared my experience preparing and taking the Google Cloud Associate Engineer Certification exam. Here are the four most important things to take away from this:

  • You can do this! If you focus your attention and take as many practice tests as possible, and carefully review your correct and incorrect answers, you’ll pass the exam.
  • I think the Official Study Guide is hands down the best resource to use for preparation.
  • It is very useful to have real-world experience and practice with gcloud CLI and Kubernetes.
  • During the exam, attention to detail is important. Read every question carefully and read the question again after choosing your answer.

Credits: Header image by KAL VISUALS on Unsplash

We have read and selected a few articles and other resources about the latest and most significant developments around cloud technology so you don’t have to. Read further and keep yourself up-to-date in five minutes!

Introducing WebSockets, HTTP/2 and gRPC bidirectional streams for Cloud Run

Support for streaming is an important part of building responsive, high-performance applications. With these capabilities, you can deploy new kinds of applications to Cloud Run that were not previously supported, while taking advantage of serverless infrastructure.

Read more:
https://cloud.google.com/blog/products/serverless/cloud-run-gets-websockets-http-2-and-grpc-bidirectional-streams

Amazon Location – Add Maps and Location Awareness to Your Applications

Amazon Location Service gives you access to maps and location-based services from multiple providers on an economical, pay-as-you-go basis. You can use it to display maps, validate addresses, turn an address into a location, track the movement of packages and devices, and more.

Read more:
https://aws.amazon.com/blogs/aws/amazon-location-add-maps-and-location-awareness-to-your-applications/

Eventarc brings eventing to Cloud Run and is now GA

Eventarc is a new eventing functionality that lets developers route events to Cloud Run services. Developers can focus on writing code to handle events, while Eventarc takes care of the details of event ingestion, delivery, security, observability, and error handling.

Read more:
https://cloud.google.com/blog/products/serverless/eventarc-is-ga

Lifecycle of a container on Cloud Run

Serverless platform Cloud Run runs and autoscales your container-based application. You can make the most of this platform when you understand the full container lifecycle and the possible state transitions within it. In this article on the Google Cloud blog, our very own Wietse Venema takes you through the states, from starting to stopped.

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/lifecycle-container-cloud-run

Amazon Aurora supports PostgreSQL 12

Amazon Aurora with PostgreSQL compatibility now supports major version 12. PostgreSQL 12 includes better index management, improved partitioning capabilities, and the ability to execute JSON path queries.

Read more:
https://aws.amazon.com/about-aws/whats-new/2021/01/amazon-aurora-supports-postgresql-12/

Keeping your Docker container image references up-to-date

Binx.io released an open source utility which allows you to list, detect and repair outdated references to container images, so you always build the most up-to-date images.

Read more:
https://binx.io/blog/2021/01/30/how-to-keep-your-dockerfile-container-image-references-up-to-date/

Over the past few months, our Cloud Engineers have proven to be so much more than cloud technology specialists. Out of necessity, they have also taken on side jobs as teacher, entertainer and nanny, all from the comfort of their own homes.

Helping Binx Parents Through The Lockdown

We are lucky to have a very creative department that looks after the wellbeing of all people within the company. Overcoming the impossible, they have made it a habit to look at what is possible in every given circumstance. The Nanny-on-Tour initiative is the latest rabbit to come from their hats. The objective is to help all the parents at Binx to get through the lockdown as comfortable as possible by installing a dedicated pool of qualified nannies!

Turning A Pool of Nannies Into Reality

We sat down with HR Advisors Roosmarijn van Zessen and Véronique Reijinga to discuss this wonderful initiative.

"We received several phone calls from colleagues who are really struggling with working from home and taking care of their children at the same time. The power of Binx, as part of Xebia, is that we help each other out", Roosmarijn van Zessen shared.

Véronique added: "In the past, we had a big pool of students, often children of colleagues, working for our hospitality crew or assisting in other departments. We’ve asked them if they would be open to work for Xebia as a nanny, and many of them responded enthusiastically! A couple of hospitality colleagues have also indicated that they would like to participate in this project. Our Xebia Nanny pool was born!"

It’s a Match!

Almost immediately after announcing this initiative, the HR team received many positive responses. The project was launched last week, and already they’ve been able to match four colleagues with a nanny from the nanny network!

It is great to see how happily surprised they were we already found a nanny for them. We will get through the homeschooling and lockdown together!

Whenever you build a container image, chances are that you are using public images as a base. But how do you keep your image up-to-date with the latest releases? In this blog I will introduce a utility which will allow you to keep your Dockerfile container image references up-to-date.

(meer…)

Did you know that, once you have authenticated using the Google Cloud Platform SDK, the credential is valid for all eternity? With the Google Cloud session control tool you can limit the validity to as little as an hour.

(meer…)

Hi, my name is Max Driessen, since November 1st, I joined the Binx team as Chief Cloud Training. In this role, it is my ambition to empower professionals with the right cloud skills by offering the best training programs. I have a background in information technology but lack hands-on programming experience.

In my second week at Binx, my colleague Martijn van de Grift— cloud consultant and authorized trainer for both Google Cloud Platform as well as Amazon Web Services— organized a Google Cloud Fundamentals training.

“This is really something for you,” Martijn told me. “This way you not only learn more about Google Cloud, but you also experience what it is like to attend a training.”

I decided to jump right in to experience what it’s like to attend a Binx training first hand.

Before I realized it, there I was; attending an online GCP Fundamentals training, with Martijn as the trainer. Obviously, this is a fundamentals training, most of our training courses go way deeper into specific topics, but for me this felt as the gateway to a new world. Although I roughly knew what to expect, this did not make me feel less nervous. But, I had one goal: to learn more about GCP.

Stretching the Comfort Zone

At the start of the training, I felt a bit tense about what was to come. That feeling vanished quickly due to the informal start where every attendee introduced him or herself briefly and shared personal learning goals with the rest of the group.

Then it was time to get going. The training was laid out in a logical order. After an initial introduction to the GCP interface, we went under the hood with various functionalities of the platform. Martijn alternated between theory and Q & A. Besides, a substantial part of the training was reserved for breakout sessions where every participant, including myself, could try out the various services with Qwiklabs exercises. These Qwiklabs provide exercises where you follow a step-by-step plan to learn how to use a specific service. What resonated with me was the way the trainer went out of his way to support every individual attendee with the successful completion of the labs.

It was obvious from the way that Martijn shared additional information on top of the materials, that he knows his way around Google Cloud like no other. From his experience, Martijn also handed out dozens of practical tips about the right way to implement the services. This resulted in a positive vibe, which also encouraged me to veer outside of my comfort zone further and further.

What Ground Was Covered?

This GCP Fundamentals training introduced me to a broad range of cloud concepts and GCP services. Besides the general introduction, I also learned about Virtual Machines and different types of Storage. Think about topics like cloud storage, SQL, Datastore, Bigtable, Spanner. Of course, containers and Kubernetes were not overlooked. We also took a look at App Engine, Stackdriver and Infrastructure as Code. Finally, there was even some time left to cover the basics of BigQuery and Managed Machine Learning APIs.

At the end of the day, it felt like I had just made it through one full week of training. 😉

Applicability

As mentioned, I had no previous experience of working with cloud platforms. For the first topics this was no problem, but the hands-on labs will be easier to apply in practice for someone who has some software engineering experience. For me personally, that was not the objective of attending this course, I wanted to obtain a global understanding of GCP. That content and experience exceeded my initial goals.
The Most Effective Format
For me it was very valuable that all attendees introduced themselves at the beginning of the training. This made it easier later on to work together and to ask questions. By sharing their individual learning goals beforehand, the trainer was able to personalize the learning experience for everyone. To me, this was a big plus compared to individual online training without a live instructor.

I must admit that a full day of online training is very intense. Especially at the end of day, it becomes more difficult to digest all the information. That is why we have chose to offer our online courses in half days by default. This way, it is easier to stay attentive, while it also makes it easier to combine training with other obligations.

Max Recommends

I am looking forward to navigating the world of cloud together with you. One of my goals is to regularly recommend specific training topics. As this Google Cloud Fundamental training was my first, let’s start by sharing who I would recommend this training for. I recommend this Google Cloud Platform Fundamentals training to everyone who would like to make a start with GCP. The training offers a broad introduction to the various GCP services and lays the groundwork for official Google Cloud certifications.

If you have any questions, feel free to get in touch with me directly by sending me an email via maxdriessen@binx.io

Two years ago, I created a utility to copy AWS SSM parameters from one account to another. I published the utility to pypi.org, without writing a blog about it. As I found out that quite a number of people are using the utility, I decided to unveil it in this blog.

(meer…)

Building Serverless Applications with Google Cloud Run

Wietse Venema (software engineer and trainer at Binx.io) published an O’Reilly book about Google Cloud Run: the fastest growing compute platform on Google Cloud. The book is ranked as the #1 new release in Software Development on Amazon.com! We’re super proud. Here’s what you need to know about the book:

Praise

I’ve been fortunate enough to be a part of the Google team that helped create Knative and bring Cloud Run to market. I’ve watched Cloud Run mature as a product over the years. I’ve onboarded thousands of customers and I wrote a framework to help Go developers build Cloud Run applications faster–and even I learned a thing or two from this book. What took me three years to learn, Wietse delivers in less than a dozen chapters.
— Kelsey Hightower, Principal Engineer at Google Cloud

About the Book

If you have experience building web applications on traditional infrastructure, this hands-on guide shows you how to get started with Cloud Run, a container-based serverless product on Google Cloud. Through the course of this book, you’ll learn how to deploy several example applications that highlight different parts of the serverless stack on Google Cloud. Combining practical examples with fundamentals, this book will appeal to developers who are early in their learning journey as well as experienced practitioners (learn what others say about the book).

Who this Book is For

If you build, maintain or deploy web applications, this book is for you. You might go by the title of a software engineer, a developer, system administrator, solution architect, or a cloud engineer. Wietse carefully balances hands-on demonstrations with deep dives into the fundamentals so that you’ll get value out of it whether you’re an aspiring, junior, or experienced developer.

What’s Next

You can do either one or of all of the following things:

How to Automate the Kritis Signer on Google Cloud

Google Binary Authorization allows you to control the images allowed on your Kubernetes cluster. You name the images or allow only images required to be signed off by trusted parties. Google documents a manual process of creating Kritis signer attestions. In this blog, I show you how to automate the Kritis signer, using Terraform and Google Cloud Run.

Cloud-based IT infrastructures and software solutions are not only disrupting IT, they disrupt the way organizations, and even industries, operate. The full extent of the cloud’s scope of influence stretches far beyond the traditional IT landscape. To name a few, cloud impacts IT management and the ways teams work together. Last but not least, it requires a new skill set for individuals to operate in this sphere. These are just a few areas where the cloud has had significant impact. The cloud can be seen as a framework. In this article, we explain how cloud technology—if it’s applied with the right mind set—helps teams accelerate development.

The Cloud Framework

The cloud computing model has destroyed the boundary between the world of hardware and the world of software: the data center has become software too. The cloud provider offers you a programming framework as a foundation on which you can build anything. You program a datacenter with all of the components: computers, networks, disks, load balancers. When you start it up, a virtual instance of your data center is started. With this model it becomes very easy to change your data center. Just change the code and restart the program!

What Is Cloud-Native

So, what does it mean to be cloud-native? Do you have to use all of the cloud provider’s specific tools and resources? No, not at all. Cloud-native means you build and run applications that exploit the cloud benefits—such as scalability, flexibility, high availability and ease of maintenance. You focus on how to create these applications and how to run them: not on where they run.

Five Advantages of Cloud-Native

Organizations that adopt the Cloud-native style of software development and delivery have significant advantages over the traditional way of software delivery. If you’re responsible for IT projects, these five cloud-native advantages will sound like music to your ears:

1. High speed of Delivery

The cloud offers loads of off-the-shelf features so engineers can focus on developing unique ones—in other words, innovation. Leveraging these features, coupled with an ability to test every change with ease, allows development teams to deliver functional software faster.

2. Predictable Processes

As everything is programmed and automated, the software delivery process becomes very predictable.

3. Strong Reliability

Cloud providers offer robust services with consistent performance and multi-regional availability, which reduces the number of potential errors

4. Pay for what you use

Instead of investing in hardware for your own data center, you only pay for what you use in the cloud. However, don’t forget to shut off the service when you don’t need it!

5. Disaster Recovery

In the cloud, everything is code. So, in the unlikely event of a disaster, it is super simple to reboot the infrastructure and applications. Just add the data, and you’re operational again.

The Impact of Cloud-Native on Teams

A cloud-native infrastructure offers developers a stable, flexible environment with high availability — a place where engineers can release new features faster and easier than ever before. Cloud platforms provide automated services to manage infrastructure and increase its availability and reliability. All of this requires different skills from the people who are responsible for the infrastructure and applications. So, how will cloud technology impact your organizational culture?

Say Goodbye to Ops As You Know It

System administrators have traditionally been responsible for installing, supporting, and maintaining servers and systems, as well as developing management scripts and troubleshooting environments. In the cloud, this is replaced by managed services, immutable infrastructure and self-healing architectures. In the cloud everything is automated and coded: from the applications, monitoring, the infrastructure and delivery processes.

Bring True Developer Skills

Cloud-native requires people who can design systems, write code, automated deployment processes and automate the monitoring of systems.

They also need to know how to adopt the best way of working. In the cloud, DevOps has become the standard framework for the software delivery life cycle. This framework oversees the entire cycle of planning, developing, using, and managing applications. Since the cloud removes most of the “Ops” work, it’s essential for organizations to amp up their internal development skills.

Embrace Changes to Your Environment

In the classical IT world, changes to an existing system was to be avoided as much as possible, as it might possibly break things. In the cloud-native world, you embraces changes to your environment, as they may expose errors in your design or application that you need to fix, so that the error will not occur when the change is released to production.

Follow the Cloud-Native Roadmap

To exploit its advantages, it’s helpful for developers to follow the cloud-native roadmap. This roadmap consists of six steps to build and run cloud-native environments.

An Example of a Cloud-Native Set-Up

Vereniging Coin - Binx Customer
COIN is an association founded by telecom providers taking care of the transition of customers between telecom providers. To reduce management costs and increase development speed, COIN turned to Binx. As a cloud-partner, Binx could help them take full control of their software development, by migrating their infrastructure and applications to the cloud.

Initially, the development team, led by Binx, containerized all applications. This made it possible to release and deploy a new version of features as soon as there was a commit—a dream come true for the configuration manager. Everything deployed was visible, including all the resources associated with it.

The team decided to deploy Amazon RDS for Oracle, a high-available Oracle service, since they could not easily migrate the existing Oracle database to an open-source relational database management system.

The organization was provided with the self-service reporting tool (Amazon QuickSight) to access the data. This allowed end-users to create their own reports, and the development team to stay focused on developing features instead.

Because all applications were containerized, the application deployment process was automated and standardized which improved both reliability and speed of deployment.

COIN adopted a service called CloudFormation to code the infrastructure, to make the environment 100% reproducible. Binx developed a large number of custom resources for features like automated password generation and infrastructure setup. Managed services automatically deploy via a SAAS-based Git service, so there’s no in-house installation at COIN.

Last but not least, Binx implemented business service level checks and monitoring to ensure that the team is alerted of any disturbance in the service offered to their end users. These Service-level indicators (SLI) measure how well the services are performing and objectives determine what acceptable levels of performance are. These SLIs are also used on a daily basis to improve the system. Event for small aberrations, the team executes root cause analysis to see if the problem can be designed out of the system.

The service level objectives are continuously monitored and every breach automatically alerts the team and the on-call engineer.

Becoming Cloud-Native: Create Your Starting Point

Now that you can see the cloud as a framework, its benefits over more traditional approaches should be clear. The next step is to adopt cloud-native ways of working. Can your organization view infrastructure as code and migrate workloads to the cloud? What skills are lacking? Can your workforce adopt a cloud-native mindset? By answering these questions, you create a starting point for your journey towards becoming cloud-native. Safe travels, we look forward to seeing you out there!

Urgent Future: Digital Culture

This article was featured in Urgent Future: Digital Culture. A trend report featuring various view points on the culture that makes digital organizations thrive in today’s economy. Download the trend report here

Before I can use a particular service, I have to enable the API in my Google project. Sometimes when I do this, more services are enabled than the one I specified. In this blog I will show you how to find service dependencies like this.

To see this in effect, I am going to enable the Cloud Functions service. First I will show you that enabling Cloud Functions, will actually enable six services in total. Then I will show you how you can list the dependencies. Finally, I will present you with a small utility and graph with all dependencies for you to play around with.

enabling Cloud Functions

So I am going to show you that enabling Cloud Functions is actually enabling multiple services. First, I am going to check the enabled services in my project:

gcloud services list

The output looks roughly like this:

NAME                              TITLE
bigquery.googleapis.com           BigQuery API
...
servicemanagement.googleapis.com  Service Management API
serviceusage.googleapis.com       Service Usage API

In this case, the Cloud Functions service is not listed.

enabling the serivce

To enable Cloud Functions in my project, I type:

$ gcloud services enable cloudfunctions
Operation "operations/acf.3170fc7d-dc07-476f-851b-ff0cc2b9d79f" finished successfully.

Now, when I check the list of enabled services again, the number of active services has increased with the following six services!

NAME                              TITLE
cloudfunctions.googleapis.com     Cloud Functions API
logging.googleapis.com            Cloud Logging API
pubsub.googleapis.com             Cloud Pub/Sub API
source.googleapis.com             Legacy Cloud Source Repositories API
storage-api.googleapis.com        Google Cloud Storage JSON API
storage-component.googleapis.com  Cloud Storage
...

Could I have predicted this before i enabled it? Yes, I could have..

listing all available services

To list all the available services, I use the following command:

gcloud services list --available --format json

The result is a list of objects with meta information about the service:

{ 
  "config": {
    "name": "cloudfunctions.googleapis.com",
    "title": "Cloud Functions API",
    "documentation": {},
    "features": [],
    "monitoredResources": [
    "monitoring": {},
    "quota": {},
    "authentication": {},
    "usage": {}
  },
  "dependencyConfig": {
    "dependsOn": [],
    "directlyDependsOn": []
    "directlyRequiredBy": [],
    "requiredBy": []
  },
  "serviceAccounts": [],
  "state": "DISABLED"
}

I see four main attributes for each service: config, dependencyConfig, serviceAccounts and state. The fields dependencyConfig lists the service dependencies, while serviceAccounts lists the service accounts which are created in the project for this service. Note that these fields are not part of the documented service usage API.

listing specific dependencies

So, this service usage API provides all the dependencies of a specific service. To list all dependent services of Cloud Functions, I use the following command:

gcloud services list \
   --available --format json | \
jq --arg service cloudfunctions.googleapis.com \
    'map(select(.config.name == $service)| 
        { 
          name:      .config.name, 
          dependsOn: .dependencyConfig.dependsOn
        }
    )'

and the result is:

{
  "name": "cloudfunctions.googleapis.com",
  "dependsOn": [
    "cloudfunctions.googleapis.com",
    "logging.googleapis.com",
    "pubsub.googleapis.com",
    "source.googleapis.com",
    "storage-api.googleapis.com",
    "storage-component.googleapis.com"
  ]
}

These are precisely the six services that were previously enabled \o/.

If you want to explore dependencies yourself, you can use this bash script. If you do not want to type, you can browse through the entire graph
google cloud platform services dependencies.

I know it is a bit tiny, but luckily it is a scalable vector graphic. So open it in a separate window and you can pan and zoom.

conclusion

Thanks to some undocumented properties of by the Google service usage API, I can find all dependencies between Google Cloud Platform services.

For a number of Google Cloud platform services I need to perform a Google site verification in order to proof that I actually own a domain. Unfortunately, the Google Terraform provider does not provide support for this. In this blog I will show you how to automate this using a custom terraform provider.

(meer…)

The AWS Cloud Development Kit (AWS CDK) makes it easy to build cloud infrastructure. And CDK Pipelines makes it "painless" to deploy cloud infrastructure. So let’s create an AWS CDK CI/CD pipeline, and build and run our application on AWS.

(meer…)

Recently I worked on a PHP application that used cron for background processing. Since it took some time to get it right, I’m sharing the solution.

(meer…)

Many applications log to files. In a container environment this doesn’t work well. The file system is not persisted and logs are lost. In GKE you can persist log files to Cloud Logging with the Cloud Logging agent. The agent, however, doesn’t specify the required resource type. Causing the logs to appear as non-Kubernetes Container logs. This blog shows how to resolve that.

Cloud Logging Agent in GKE

The Cloud Logging agent is a Fluentd service configured with the Google Cloud output plugin. Events sent to the output plugin must include a Cloud Operations for GKE resource types. Since without any resource type, the event will be registered as a VM instance log record in Cloud Logging.

Set the Kubernetes Container resource type

Kubernetes Container events use resource type k8s_container.[namespace].[pod].[container] and are specified by the attribute logging.googleapis.com/local_resource_id using a Fluentd filter.

<filter **>
  @type record_transformer
  enable_ruby true
  <record>
    "logging.googleapis.com/local_resource_id" ${"k8s_container.#{ENV['K8S_NAMESPACE']}.#{ENV['K8S_POD']}.#{ENV['K8S_CONTAINER']}"}
  </record>
</filter>

The entire configuration file is available on GitHub.

The filter uses Kubernetes’ Downward API to read resource type data from environment variables.

    spec:
      containers:
        - name: application
        - name: logging-agent
          env:
          - name: K8S_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: K8S_POD
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: K8S_CONTAINER
            value: application

Deploy the GKE logging agent

The logging agent uses a sidecar container deployment to access application log files.

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        - name: app
          volumeMounts:
            - name: logs
              mountPath: /app/log
        - name: gke-fluentd
          env:
          - name: K8S_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: K8S_POD
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: K8S_CONTAINER
            value: app # Use the application container name
          volumeMounts:
            - name: logs
              mountPath: /app/log
      volumes:
        - name: logs
          emptyDir: {}

Try it yourself with the example application provided at GitHub.

Discussion

The logging agent allows us to persist logs. However, we don’t actually solve the problem of using log files in a container-environment..

In container-environments you should use cloud-native loggers, or stream all logs to the console. Changing the logger should be a dependency configuration-file update; and changing the logger behavior should be a logger configuration-file update.

Conclusion

The Cloud Logging agent allows you to keep using log files in container environments. Since the default configuration doesn’t specify the appropriate GKE resource types, you will have to maintain the agent. Therefore I recommend to fix your application logging. Allow configuration of a cloud-native logger and use structured logging to opt-in to all major log analysis services.

Image by Free-Photos from Pixabay

When you want to configure a SAML identity provider to enable SSO for AWS, you will find that CloudFormation does not provide support for this. In this blog we will present you with a custom provider which will allow you to configure the SAML identity provider in just a few lines!

How to use

To add a SAML identity provider using your AWS CloudFormation template, use a Custom::SAMLProvider resource with reference
to the metadata URL:

  SAMLProvider:
    Type: Custom::SAMLProvider
    Properties:
      Name: auth0
      URL: https://auth0.com/mytenant/providerurl
      ServiceToken: !Sub 'arn:${AWS::Partition}:lambda:${AWS::Region}:${AWS::AccountId}:function:cfn-saml-provider'

When the provider is created, it will load the metadata of the identity provider from the specified URL. If you want a static approach, you can also specify the metadata itself:

  SAMLProvider:
    Type: Custom::SAMLProvider
    Properties:
      Name: auth0
      Metadata: |
        <EntityDescriptor entityID="urn:binxio.auth0.com" xmlns="urn:oasis:names:tc:SAML:2.0:metadata">
                ....
        </EntityDescriptor>
      ServiceToken: !Sub 'arn:${AWS::Partition}:lambda:${AWS::Region}:${AWS::AccountId}:function:cfn-saml-provider'

On completion, it will return the ARN of the SAML Provider. That is all there is to it. From there on, you can configure IAM roles based upon the established identities in this account.

Deploy the custom provider

To deploy the provider, type:

aws cloudformation deploy  \
        --capabilities CAPABILITY_IAM \
        --stack-name cfn-saml-provider \
        --template-file ./cloudformation/cfn-saml-provider.json

This CloudFormation template will use our pre-packaged provider from s3://binxio-public-${AWS_REGION}/lambdas/cfn-saml-provider-latest.zip.

Demo

To install the simple sample of the SAML provider, type:

aws cloudformation deploy --stack-name cfn-saml-provider-demo \
        --template-file ./cloudformation/demo-stack.json

to validate the result, type:

aws iam list-saml-providers

conclusion

With just a few lines of code you can configure the SAML provider required to implement SSO for your AWS accounts, infrastructure as code style. And that is the only way you want it, right?

You may also like How to get AWS credentials and access keys using the Auth0 SAML identity provider and How to limit access to AWS Resources based on SAML Attributes using CloudFormation.

Image by jacqueline macou from Pixabay

Restricting access to IAM resources based on SAML Subject

Many larger organizations manage their own Active Directory servers. For AWS access, they typically create an identity provider to provide a single sign on (SSO) experience for logging onto AWS. The user is then often granted access to a particular role that grants particular rights. This approach, however, lacks practical fine-grained control. Managing these rights effectively means creating and maintaining many IAM roles and AD groups.

An alternative approach, outlined in this blog post, is to utilize the SAML assertions in your IAM policies. If you enable ABAC (which requires the IAM::TagSession privilege on the role) you will be able to differentiate based on custom SAML attributes. Without ABAC you can still use a couple of attributes such as the SAML subject. In this post we will first focus on the latter: How to use the SAML subject to restrict access to IAM Roles.

The CloudFormation template below creates an IAM Role. Its only permissions are listing buckets, listing a particular S3 bucket and full S3 access on a particular subfolder. Finally, we limit the access to the subfolder based on the list of SAML subjects provided as a parameter to this stack.

In the AssumeRolePolicyDocument (Trust policy) we define the principal as our SAML provider. That means that anyone that has federated access to AWS can assume this role:

        Statement:
          - Action:
              - sts:AssumeRoleWithSAML
              - sts:TagSession
            Condition:
              StringEquals:
                SAML:aud: https://signin.aws.amazon.com/saml
            Effect: Allow
            Principal:
              Federated: !Ref 'SamlProviderArn'

To limit access to a supplied comma-separated list of SAML Subjects we use the following Condition in the S3 IAM Policy:

              Condition:
                ForAllValues:StringLike:
                  saml:sub: !Split
                    - ','
                    - !Ref 'SamlSubjects'

This condition basically says: Only allow this if your SAML subject is one of those in the list provided. Note that you could also limit access to the role by moving the condition section to the Trust Policy.

The template below takes 3 parameters: The bucket name you wish to secure, the ARN of the SAML provider, and finally the comma-separated list of SAML subjects.

Complete template:

Description: Bucket Restriction Stack
Parameters:
  BucketName:
    Description: Bucket name to grant access to
    Type: String
  SamlProviderArn:
    Description: ARN of SAML provider
    Type: String
  SamlSubjects:
    Description: List of Strings
    Type: String
Resources:
  SamlBucketRole:
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action:
              - sts:AssumeRoleWithSAML
              - sts:TagSession
            Condition:
              StringEquals:
                SAML:aud: https://signin.aws.amazon.com/saml
            Effect: Allow
            Principal:
              Federated: !Ref 'SamlProviderArn'
        Version: '2012-10-17'
      Path: /
      Policies:
        - Statement:
            - Action:
                - s3:ListAllMyBuckets
                - s3:GetBucketLocation
              Effect: Allow
              Resource: '*'
            - Action:
                - s3:ListBucket
              Effect: Allow
              Resource:
                - !Sub 'arn:aws:s3:::${BucketName}'
            - Action:
                - s3:*
              Condition:
                ForAllValues:StringLike:
                  saml:sub: !Split
                    - ','
                    - !Ref 'SamlSubjects'
              Effect: Allow
              Resource:
                - !Sub 'arn:aws:s3:::${BucketName}'
          Version: '2012-10-17'
    Type: AWS::IAM::Role

Now you may not want to maintain a list of SAML subjects. Imagine, for example, you have an S3 Bucket you want to secure and it has a list of subfolders for each user, with the folder name being identical to the SAML subject. A policy for this access policy could look something like this:

        {
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::some-s3-bucket"
            ],
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "",
                        "${saml:sub}/*"
                    ]
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::some-s3-bucket/${saml:sub}/*"
            ]
        }

These parameterized way of creating IAM policies for SAML subjects scales really well, but you’ll lose some flexibility in terms of what you can provide access too.

Lastly, you may want to be able to restrict access based on custom SAML attributes. AWS’ default SAML attributes that can be used is very limited. By enabling Session Tags you can utilize custom SAML attributes and differentiate on them in your IAM policies.

Here is an example:

        {
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::my-s3-bucket"
            ],
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "",
                        "${aws:PrincipalTag/my-custom-attribute}/*"
                    ]
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::my-s3-bucket/${aws:PrincipalTag/my-custom-attribute}/*"
            ]
        }

Similarly to the first example, we can also maintain a list of explicit conditions using the custom SAML attribute if we use ABAC.

Using Infrastructure as Code to maintain and deploy these roles and policies reduces operational overhead and prevents human errors, however: there are hard limits on the size of policies and the amount of conditions one can use. It does not scale very well out of the box.

Conclusion

Federation requires you to think hard about your data access patterns. Using SAML attributes in your IAM policies is a powerful method of automating access. In order for this to scale for large organizations you must take great care of your deployment strategy of these policies. One approach could be to use several (nested) stacks and individual roles to work around the policy size limitations.

You may also like How to get AWS credentials and access keys using the Auth0 SAML identity provider and How to configure a SAML identity provider to enable SSO in AWS CloudFormation.

Organizations have an increasing need for total control of their cloud consumption. In many cases, multiple cloud accounts combined with unspecified invoices have led to the nearly impossible challenge to get complete insight in the organization-wide cloud consumption. While new features are released, organizations often don’t take the time to optimize their cloud workloads. With Cloud Control, full-service cloud consultancy boutique Binx offers organizations a complete solution to take control of their cloud consumption.

With Cloud Control, full-service cloud consultancy boutique Binx offers organizations a complete solution to take control of their cloud consumption.

“It is our ambition to make organizations cloud native. With Cloud Control we provide organizations with cloud environments that perform as optimal as the entire administrative process around organisation-wide cloud consumption”, said Bart Verlaat, CEO at Binx.io.

First of all, Binx Cloud Control accumulates all cloud spend in one clear invoice. Combined with real-time dashboards, organizations get access to unprecedented insights in their cloud consumption, even across multiple cloud platforms. Cloud Control also regularly reviews the cloud environment on potential optimization and provides support to prevent outages and underperformance.

With this complete service offering organization have immediate access to:

  • single view of organization-wide cloud consumption;
  • one aggregated invoice with a clear overview of all cloud – consumption, across cloud platforms;
  • regular reviews of cloud workloads;
  • 24/7 monitoring and support.

Cloud Billing

For many organizations, keeping track of cloud cost is a tedious task. With multiple departments and teams who use their own credit cards to consume cloud services, it is easy for the IT-department to lose control of the cloud consumption.

With Cloud Billing, organizations have the complete picture of the cloud consumption across the entire organization, even when this involves multiple cloud platforms. Clients receive specified invoices of all the cloud services used within the organization. On top of this, clients receive access to an interactive dashboard with real-time usages stats.

The Cloud Control team is an extension of your organization. This highly-specialized team is only a phone call or message away and has only one objective: to create the most effective cloud environment for you.

But the biggest bonus for clients is that all of these benefits come without any additional costs compared to a direct invoicing relationship with a cloud provider.

“Besides insights in the cloud consumption and the associated costs, we also actively report on potential points of improvement. Without spending an extra euro, organizations gain total control on their cloud investment”, explains Verlaat.

Cloud Control also offers solutions for organizations that are used to more traditional IT procurement (CapEx).

“Based on the expected cloud consumption, organizations can purchase bundles of cloud credits upfront instead of receiving invoices after every period”, said Verlaat.

The interactive dashboards of Cloud Billing provide real-time insight in the cloud consumption. The Finance department will appreciate the monthly aggregated invoice they can process seamlessly.

Cloud Reviews

As most cloud environments are not isolated, it is extremely difficult for cloud architects and cloud engineers to create and maintain environments that are completely optimized. In practice, added workloads and processes can lead to a more inefficient performance of an environment.

Binx Cloud Control offers regular reviews of workloads and infrastructure to evaluate architectures, and implement designs that will scale over time.

“A cloud environment review gives organizations an immediate insight in points of optimization”, emphasized Dennis Vink, CTO of Binx Cloud Control.

Cloud Reviews lead to cloud environments that are optimized to deliver minimum cost, maximum security, the most efficient performance, the highest reliability and operational excellence.

Cloud Support

For business-critical processes, downtime or underperformance are unacceptable. Cloud-native practices like extreme automation, standardization and event-driven architectures contribute to the eradication of performance issues.

Binx Cloud Control supports organizations to automate the operations of their cloud environment. Clients have direct access to a dedicated senior cloud consultant. This specialized engineer is familiar with the client’s specific situation and offers pragmatic support per phone, e-mail, Slack, or on-premises.

“Our support is both reactive and proactive”, Vink explains. "As soon as performance hits a predefined threshold, Binx Cloud Control receives notification. The senior cloud consultant immediately takes appropriate action to take away the root cause.”

With Cloud Support, organizations have the peace of mind of cloud environments with optimal performance.

For more information, please get in touch with Bart Verlaat

As of June 2020, you can enable the Firestore key-value database with Terraform. Two things are somewhat confusing:

  • You need to create an App Engine app to enable the Firestore database
  • You need to use Datastore IAM roles.

This is what you need to do:

Use the Google Beta Provider

Start with using the google-beta provider (it might be in main by the time you read this).

provider google-beta {
  project = var.project_id
  version = "~> 3.0"
}

Create an App Engine App

In order to use Firestore, you first need to create an App Engine app. As I understand, there is work underway to remove this limitation, but this is how it is right now. Here’s what you need to know:

  • You can only enable App Engine once per project.
  • The region (location_id) choice is permanent per project – and can not be undone.
  • You will not be charged for enabling App Engine if you don’t use it.
variable "location_id" {
  type        = string
  description = "The default App Engine region. For instance 'europe-west'"
}

# Use firestore
resource google_app_engine_application "app" {
  provider      = google-beta
  location_id   = var.location_id
  database_type = "CLOUD_FIRESTORE"
}

Using Firestore From Your Application

Enable the firestore API, to make sure your applications can connect using the Firestore client libraries.

resource google_project_service "firestore" {
  service = "firestore.googleapis.com"
  disable_dependent_services = true
}

If you are not using default service accounts (or disable the default grants) – you will need to provide the Datastore User role. Yes, that’s datastore, not firestore.

resource google_project_iam_member "firestore_user" {
  role   = "roles/datastore.user"
  member = "serviceAccount:[YOUR SERVICE ACCOUNT]"
}

On Google Cloud Platform, we use the Google Secret Manager to keep our secrets safe. But accessing the secrets from an existing application is intrusive. You either have to call the API in the application or use the secrets cli in the entry point script of the container. In this blog, I introduce you to the utility gcp-get-secret.This utility changes references to secrets into environment variable values.

(meer…)

Creating dynamic infrastructures with Terraform used to be a challenge. Start using the for_each-meta-argument to safely and predictably create your infrastructure while limiting code duplication.

This post gives you a real-world example of how to effectively use the for_each meta-argument of Terraform 0.12.

(meer…)

Google Cloud Platform (GCP) uses service accounts to authorize applications or services. Azure Pipelines typically stores these credentials as Service Connection. However, a GCP Service connection is unavailable. Therefore we use Secure Files.

(meer…)

When you want to create a command line utility for Google Cloud Platform, it would be awesome if you could authenticate using the active gcloud configuration. Unfortunately, none of the Google Cloud Client libraries support using the gcloud credentials. In this blog, I will present a small go library which you can use to do just that.

How to use it?

It is really simple. You import the package github.com/binxio/gcloudconfig and call the GetCredentials function, as shown below:

package main

import "github.com/binxio/gcloudconfig"

func main() {
    name := ""
    credentials, err := gcloudconfig.GetCredentials(name)
    ...
}

The name specifies the configuration you want to use, or the current active one if unspecified. The credentials can be passed in when you create a service client, as shown below:

    computeService, err := compute.NewService(ctx,
                                 option.WithCredentials(credentials))

If the core/project property has been set, it is available in the credential too:

    project := credentials.ProjectId

That is all there is to it! Check out the complete example of using the gcloud configured credentials. If you want to access other settings in the configuration use GetConfig.

How does it work?

The function will executes the command gcloud config config-helper, which is a gcloud helper for providing authentication and configuration data to external tools. It returns an access token, an id token, the name of the active configuration and all of the associated configuration properties:

configuration:
  active_configuration: playground
  properties:
    core:
      account: markvanholsteijn@binx.io
      project: playground
    ...
credential:
  access_token: ya12.YHYeGSG8flksArMeVRXsQB4HFQ8aodXiGdBgfEdznaVuAymcBGHS6pZSp7RqBMjSzHgET08BmH3TntQDOteVPIQWZNJmiXZDr1i99ELRqDxDAP8Jk1RFu1xew7XKeQTOTnm22AGDh28pUEHXVaXtRN8GZ4xHbOoxrTt7yBG3R7ff9ajGVYHYeGSG8flksArMeVRXsQB4HFQ8aodXiGdBgfEdznaVuAymcBGHS6pZSp7RqBMjSzHgET08BmH3TntQDOteVPIQWZNJmiXZDr1i99ELRqDxDAP8Jk1RFu1xew7XKeQTOTnm22AGDh28pUEHXVaXtRN8GZ4xHbOoxrTt7yBG3R7ff9ajGV
  id_token: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJodHRwczovL2FjY291bnRzLmdvb2dsZS5jb20iLCJhenAiOiI5OTk5OTk5OTk5OS5hcHBzLmdvb2dsZXVzZXJjb250ZW50LmNvbSIsImF1ZCI6Ijk5OTk5OTk5OTk5LmFwcHMuZ29vZ2xldXNlcmNvbnRlbnQuY29tIiwic3ViIjoiMTExMTExMTExMTEyMjIyMjIyMjIyMjIiLCJoZCI6InhlYmlhLmNvbSIsImVtYWlsIjoibWFya3ZhbmhvbHN0ZWlqbkBiaW54LmlvIiwiZW1haWxfdmVyaWZpZWQiOnRydWUsImF0X2hhc2giOiJScnhBVHRSaTE2TFNOSG1JdnZEWVdnIiwiaWF0IjoxNTg4NzAxNjgzLCJleHAiOjE1ODg3MDUyODN9.DWtAHSvWgXaW0zzoLf2MkiROS_DSS2Wf-k_HQj53g3I
  token_expiry: '2020-05-05T19:01:22Z'

When the token is expired, the library will call the helper again to refresh it. Note that Google is unsure whether the config-helper is a good thing. If you read gcloud config config-helper --help, you will notice the following sentence:

This command is an internal implementation detail and may change or disappear without notice.

Although, for the development of command line utilities which integrate into the Google Cloud SDK ecosystem, it would be really handy if Google would provide an official way to obtain the active gcloud configuration and credentials.

conclusion

With the help of this library, it is possible to create a command line utility in Go for the Google Cloud Platform, that integrates into the gcloud ecosystem. It is unfortunate that the config-helper is documented to be a volatile interface. Given the simplicity of the interface, I trust this library will be able to deal with any future changes. It would be even better, if Google would provide official support.

We also created a library for authenticating with gcloud credentials in Python.

Image by Kerstin Riemer from Pixabay

For the third consecutive year, Binx, Xebia DevOps and Computable conduct “The Cloud Survey”. The objective of this survey is to establish insight in how organization use cloud technology and to point out the major trends.

Download the report

Which Platforms Are Leading?

The past surveys indicated that only a few organizations are not using any cloud technology at all. Microsoft Azure, Amazon Web Services and Google Cloud Platform are the most-adopted cloud platforms. Furthermore, The Cloud Survey showed that the development of skills is essential for organizations to realize actual cloud transformations.

What Are The Benefits of Cloud Technology?

Organizations stand united in their belief of the competitive advantage cloud technology brings them. The most important reasons for this conviction are the possibilities to operate more flexibly, to reduce cost, and to increase the innovation power.

What Are The Challenges of Cloud Technology?

Large organizations with more than 500 employees indicated that they find it difficult to properly apply the principles of cloud technology. Cloud transformations often fail because of resistance to change from within the organization and/or a lack of experienced IT-staff to set the right example.

What’s Your Cloud Experience?

Did organizations manage to realize cloud transformations in the past year? What are the success factors to get the most out of cloud technology? Has machine learning made the leap to mainstream adoption? These are just a few of the questions that Computable, Xebia DevOps and Binx.io would like to see answered with the Cloud Survey.

Download the report
Binx - The Cloud Survey 2020 Prizes

Win Great Prizes

Participants are eligble to win some great prizes, including a box of Lego “International Space Station”, or a copy of the books “DevOps voor Managers” or “Building Serverless Applications on Cloud Run”.

At times, working from home can be challenging. At the same time, it can also be very rewarding to have the opportunity to spend more time around the house. In this booklet, the Binx consultants share 13 tips (and 1 bonus tip) on how to stay sane while working from home.

Look out for some hidden gems in the pictures:

  • Funny mugs
  • Hidden hand creams
  • Photobombing’kids
  • Collectable stones
    and more.

We hope you enjoy this booklet.

The Cloud Survey 2020 – Share your experiences – receive the report

Thirteen Tips For Working at Home

Tip 1: Turn commute time into gym time

Turn commute time into home gym time. Suddenly, this makes traffic jams feel not so bad at all.

Tip 2: Put the music on

Put the music on. Loud. And maybe, just maybe, pull off a little
dance.

Tip 3: Adjust the seat and settings

Make sure your chair provides the right support and your monitor is at the right height.

Tip 4: Work from the couch

It is no crime to work from the couch every now and then.

Tip 5:Be rigorous with your schedule

Let go of the 9-to-5 but be rigorous with your schedule; your monkey brain will thank you for it.

Tip 6: Be less efficient with the coffee run

Be less efficient. For most, home is smaller than the office. Grab only one drink on your coffee run.

Tip 7: Smoke a cigar behind your desk

Coffee is so pre-WFH. Smoke a cigar. Behind your desk. Go on, indulge yourself.

Tip 8: Get into the state of flow

Use flow to increase your effectiveness. Lose all distractions and go deep on your backlog!

Tip 9: Go out and play

Cherish the extra family time. Once the labor is out of the way, go all out and play.

Tip 10: Make lunch work for you

Make lunch work for you! Boil 2 eggs, add Udon noodles and cooked veggies. Top off with some nori sushi. Yum!

Tip 11: Eat lunch outside

Keep your house tidy. Eat lunch outside.

Tip 12: Reward yourself

Work hard, eat healthy. On top of this, take a moment each day to reward yourself for accomplishments.

Tip 13: Reset the brain

Take regular breaks away from the screen. A little car ride can help to reset the brain.

Bonus tip: Laugh more often

Humor heals. Instead of using SnapCam, dress up for real in a meeting.

Google Cloud CI/CD pipelines require Google Cloud SDK. Azure Pipelines however doesn’t provide a Tool Installer-task for it. Therefore we created it.

(meer…)

With Google’s Cloud Run it has become very easy to deploy your container to the cloud and get back a public HTTPS endpoint. But have you ever wondered about how the Cloud Run environment looks from the inside?
In this blog post we will show you how to login to a running Google Cloud Run container.

(meer…)

In automation environments such as Azure DevOps you can’t use Terraforms interactive approval. Therefore you create a deployment plan, wait for a plan approval and apply the deployment plan. This blog implements a plan approval in Azure Pipelines using the Manual Intervention-task.

(meer…)

HashiCorp Packer is a great tool for building virtual machine images for a variety of platforms including Google Cloud. Normally Packer starts an GCE instance, builds the machine image on it and terminates the instance on completion. However sometimes the process is aborted and the instance is left running, racking up useless cloud spend. In this blog I present a utility to get delete lingering Packer instances.

(meer…)

This event will be hosted by Amazon Web Services and Binx.io and takes place on April 24th. We’ll cover the second pillar of the Well-Architected Framework. This pillar is Performance Efficiency.

During this event, you can expect to hear from a senior solution architect with Amazon Web Services and a senior solution architect with Binx.io and they will go over performance efficiency in your cloud environment. Again, this event takes place on April 24th.We hope to see you there.

Click here to register

How do you update an EC2 instance with volume attachments using CloudFormation? When you have a stateful server with one more volumes attached to it in your infrastructure, the AWS::EC2::VolumeAttachment resource makes it impossible to update the instance. In this blog I will show you how to configure volume attachments that allow the instance to be updated using an auto scaling group and the EC2 Volume manager.

When you try to update an AWS::EC2::VolumeAttachment, CloudFormation will give the following error:

ERROR: Update to resource type AWS::EC2::VolumeAttachment is not supported.

This prevents you from updating the AMI, or any other property that requires a replacement of the instance.

I solved this problem by changing the resource definition of the stateful machine to a single instance auto scaling group and use the EC2 volume manager utility to dynamically attache volumes upon the start of instances.

To implement this, you have to:

  1. deploy the ec2 volume manager
  2. change from an instance to an auto scaling group
  3. enable rolling updates on the auto scaling group
  4. signal successful startup to CloudFormation
  5. attach ec2 volume manager tags to volumes and instances

Deploy the EC2 volume manager

Deploy the ec2-volume-manager using the following commands:

git clone https://github.com/binxio/ec2-volume-manager.git
cd ec2-ec2-volume-manager
aws cloudformation deploy \
        --capabilities CAPABILITY_IAM \
        --stack-name ec2-volume-manager \
        --template ./cloudformation/ec2-volume-manager.yaml

Change instance to auto scaling group

Change the definition of your persistence instance, from a AWS::EC2::Instance to an single instance auto scaling group. From:

StatefulServer:
  Type: AWS::EC2::Instance
  Properties:
    SubnetId: !Select [0, !Ref 'Subnets']
    LaunchTemplate:
      LaunchTemplateId: !Ref 'LaunchTemplate'
      Version: !GetAtt 'LaunchTemplate.LatestVersionNumber'

to:

AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      AutoScalingGroupName: !Ref AWS::StackName
      VPCZoneIdentifier:
        - !Select [0, !Ref 'Subnets']
      LaunchTemplate:
        LaunchTemplateId: !Ref 'LaunchTemplate'
        Version: !GetAtt 'LaunchTemplate.LatestVersionNumber'
      MinSize: '0'
      MaxSize: '1'
      DesiredCapacity: '1'

Enable rolling update

Instruct CloudFormation to perform a rolling update to replace the instances:

AutoScalingGroup:
  Type: AWS::AutoScaling::AutoScalingGroup
  Properties:
    ...
    UpdatePolicy:
      AutoScalingRollingUpdate:
        MinInstancesInService: 0
        MaxBatchSize: 1
        WaitOnResourceSignals: true

When the instance needs to be replaced, CloudFormation will update the auto scaling group by destroying the old instance first, followed by the creation of a new instance.

Signal successful startup

CloudFormation will wait until the instance reports it has succesfully started. This is done by the cfn-signal at the end of the boot commands in the launch template.

LaunchTemplate:
  Type: AWS::EC2::LaunchTemplate
  Properties:
    ...
    UserData: !Base64
      Fn::Sub: |
        bootcmd:
          ...
          -  /opt/aws/bin/cfn-signal --stack ${AWS::StackName} --resource AutoScalingGroup

Without this signal, the update is rolled back.

Attach tags on volumes and instance

The EC2 volume manager utility automatically attaches volumes to instances with the same tag values. When an instance with the tag ec2-volume-manager-attachments reaches the state running, it will attach all volumes with the same tag value. When the instance is stopped or terminated, all volumes with a tag ec2-volume-manager-attachments will be detached from it.

To get the volume manager to work, tag the volumes of the instance as follows:

  Disk1:
    Type: AWS::EC2::Volume
    Properties:
      AvailabilityZone: !Sub '${AWS::Region}a'
      Size: 8
      Tags:
        - Key: ec2-volume-manager-attachment
          Value: stateful-instance-1
        - Key: device-name
          Value: xvdf

Note that the volume manager also requires the tag device-name, referring to the device name of the volume for the operating system. Next, add the ec2-volume-manager-attachments to the auto scaling group:

AutoScalingGroup:
  Type: AWS::AutoScaling::AutoScalingGroup
  Properties:
    ...
    Tags:
      - Key: ec2-volume-manager-attachment
        Value: stateful-instance-1
        PropagateAtLaunch: true

That is all that is required to enable fully automated updates of a stateful server with attached volumes.

You can see all differences when you compare the CloudFormation template of the stateful server with the template for the ec2-volume-manager solution.

Deploy the demo

A demo is available. Deploy it with:

export VPC_ID=$(aws ec2  
                --output text \
                --query 'Vpcs[?IsDefault].VpcId' describe-vpcs)
export SUBNET_IDS=$(aws ec2 describe-subnets --output text \
                --filters Name=vpc-id,Values=$(VPC_ID) \
                          Name=default-for-az,Values=true \
                --query 'join(`,`,sort_by(Subnets[?MapPublicIpOnLaunch], &AvailabilityZone)[*].SubnetId)')

aws cloudformation deploy \
        --capabilities CAPABILITY_NAMED_IAM \
        --stack-name ec2-volume-manager-demo \
        --template ./cloudformation/demo-stack.yaml \
        --parameter-overrides VPC=$VPC_ID Subnets=$SUBNET_IDS

Conclusion

When you have a stateful server with one more volumes attached to it in your infrastructure, the AWS::EC2::VolumeAttachment resource makes it impossible to update the instance. But, when you use a single instance auto scaling group in combination with the EC2 Volume manager you can!

If you want to want you can also attach a static ip address to your stateful instance. Make sure to properly mount ebs volumes during boot.

Alternative solutions

Although, we always recommend to keep your EC2 instances stateless and use managed persistent services whenever possible, we have successfully used the EC2 volume manager for IBM MQ and Microsoft SQL Server instances.

If you do not like the magic of the EC2 volume manager, you can also attach the volumes in the boot script of the persistent instance.

HashiCorp Packer is a great tool for building virtual machine images for a variety of platforms including AWS. Normally Packer starts an EC2 instance, builds the AMI on it and terminates the instance on completion. However sometimes the process is aborted and the instance is left running, racking up useless cloud spend. In this blog I present a utility to get rid of old Packer instances.

(meer…)

Last month I began on a new assignment. While waiting to get access to the customer’s Cloud environment, I started the work in my own AWS playground account. I made good progress and after a few days I could switch over. As I forgot to destroy the infrastructure, I ended up with a $1700 bill. To avoid this from happening every again, I automatically destroy all resources in my AWS account every day at 23:00 UTC.

(meer…)

Most cloud environments are not optimized for cost. Despite the best intentions, it is extremely difficult for cloud architects and cloud engineers to create and maintain cloud environments that are completely optimized for the lowest possible spend and the least operational overhead.

Often, cost reductions can be accomplished by optimizing elements such as:

  1. Traditional Applications and Infrastructures
  2. Unknown Values Beforehand
  3. Unexpected Peak Loads
  4. Inefficient Architecture
  5. Operational Overhead

Five Ways to Minimize Your Cloud Spend

In this article, we discuss five actions cloud engineers can take to reduce the cost of their cloud environment.

Interested in Exploring Potential Cost Reduction?

To support organizations in the current challenging times, I am offering a free intake to explore potential cost savings. Get in touch for a free intake to discuss cost optimization >>

Cost Reduction 1. Refactoring Traditional Applications and Infrastructures

Traditional applications and infrastructures usually are not designed by the latest cloud principles and best practices. This makes them bulky, heavy, and inefficient.

Contemporary cloud infrastructures and applications scale when demand increases. This can be realized by designing processes that are horizontally scalable, are as compartmentalized as possible, and are preferably serverless so that you do not pay for idle but for actual use only. This means breaking up monoliths into a microservices architecture, statelessness of your application landscape, and minimizing the blast radius of any component to improve resiliency.

If not, a workload is likely to be inefficient, a tax on operations, and not cost-effective.

By refactoring these workloads, costs can be reduced significantly.

Cost Reduction 2. Filling in the Unknown Values

When architecting a cloud environment, it is usually difficult to calculate the actual operational cost of the workloads. Cloud services are invoiced on a pay per use basis. Despite calculation tools as the AWS Simple Monthly Calculator or the Google Cloud Platform Pricing Calculator, certain values are simply not known upfront.

For this reason, most companies start building their platform first. Based on the actual load or traffic, organizations experience the price associated with the process. Still, the return on investment (ROI) can differ per service for each use case of the platform.

As soon as a workload is live, it is wise to check the actual consumption and cost. A cloud expert usually is able to point out the room for optimization.

Cost Reduction 3. Engineering for the Right Load

In development, engineers need to make certain assumptions of the expected load on a service. In practice, the load on a workload or service can be much higher than originally anticipated. Services that are designed to be spun up only occasionally are not always well-designed to run continuously or to hit certain peak loads.

In addition, traditional architecture is not very well suited to deal with a sudden surge in traffic. In practice this means companies either provision resources for anticipated peak capacity, or run underutilized. This often leads to an exponential increase in cost, or a damage to reputation: downtime.

Cost reduction and being ready for peak loads do go hand in hand, however. For example, costs can be reduced by moving workloads to a container orchestration platform, and by migrating services to a pay per use model. Using the right data store for the right data can be a huge saving, and using a CDN to cushion your instances from sudden spikes of traffic, allowing for sufficient time to scale out.

Cost Reduction 4. Streamlining the Architecture

Most cloud environments are not isolated, but continuously evolve. In practice, added workloads and processes can lead to a more inefficient performance of an environment. It is smart to regularly review your workloads and infrastructure to get rid of inefficient use of services. It is even smarter to monitor these workloads and evolve the architecture through automation.

Cost Reduction 5. Operational Overhead

An often underestimated reduction in cost is operational overhead. Rarely, the human cost is taken into consideration: after all, it isn’t visible on your cloud vendor’s bill. Operational overhead can be reduced by opting for services offered by a cloud vendor that require no management whatsoever. Do not reinvent the wheel. The time saved by reducing operational overhead is time that can be spent on interesting and productive stuff. As such, you need less staff to do the same work while at the same time increasing the job satisfaction tremendously.

Example 1: Reducing Operational Overhead

Being in an operations engineering team of two is not fun, particularly when you have to maintain infrastructure that is built on legacy and good intentions. This was the case at Amsterdam-based Asellion, the developer of a global platform that makes the global chemical trade transparent and sustainable.

To reduce their cloud spend as well as the operational overhead, I set the operational overhead as THE metric for success. During weekly retrospectives we assessed if we were on the right path. By doing so I’ve been able to help Asellion by transitioning their architecture to a modern architecture based on best practices. While the team grew from 10 employees to over 50, employee satisfaction scores (eNPS) went through the roof. In addition, their developers are now deploying and maintaining their own applications and enjoy doing so. Asellion is definitely on my list as a cool cloud-based company now.

Example 2: Handle Peak Loads Gracefully

Some organizations experience the need for optimization as soon as they start scaling. At that moment, it becomes clear that processes are way too expensive. Stranded Flight Solutions (SFS) experienced this first hand.

SFS is a start-to-end global service recovery platform for airlines to improve the guest experience when these are challenged the most. I was asked to assess their infrastructure with a particular focus on the cost. I found that their applications were under-provisioned for peak capacity, while being over-provisioned for idleness. In other words: While most of the team the platform aw little traffic, as soon as passengers would find themselves stranded at an airport, the services could not handle the peak load. I helped Stranded Flight Solutions to transition to a mature architecture that is scalable by using serverless, CDNs and container orchestration. This has led to a substantial cost reduction

Free Review of Potential Cost Savings

To support organizations in the current challenging times, we offer a free intake where we explore potential cost savings. This call takes between 30-60 minutes. During the call, we will try to uncover potential savings on your monthly cloud bill. If potential savings are indicated, we can support you with an optional extensive review and remediation.

This cost optimization review helps you to identify the potential measurements you can take to minimize the cost of every individual use case.

After remediation, organizations save up to thousands of euros per month on their cloud bill.

Request your free cost optimization intake >>

Just like any other organization over the past week, Binx had to adjust to the new normal. What a week it has been! We hope that you and your beloved ones are in good health and spirit, and that you are in control of your business. Here’s an update of how we at Binx are coping with this abrupt change to our business.

It is my belief that the current situation will have a lasting impact on the way that business is done. By sharing our experiences, I’d like to give you an insight in what we are going through and the changes that we’ve made. If you have any other ideas or experiences to share, please let me know!

Decentralizing the Organization

In just one week, we have transformed into a decentralized company. For a business that is focused entirely on consultancy and training, working from home is quite the change. What didn’t change, is our commitment to help our clients make the most effective use of cloud technology.

Over the last week, we’ve been in close contact with many of our clients to look for creative and flexible solutions, so they can ensure their business continuity. Many of our clients have felt the immediate impact of the COVID-19 measures and it is great to see the solidarity and a ‘we can do this together’-mentality that has blossomed over the past few days.

Our IT Infrastructure

As a cloud-born consultancy and training boutique, we were set-up for a remote way of working from the start of our organization two years ago. Our cloud consultants work with the latest hardware and online tools. Luckily, most of them have already invested in a proper setup at their home office. If not, we support our consultants with this. Our business applications, mostly GSuite, are running in the cloud.
Working from home - Binx
We even allowed a few that got fed up with working from their kitchen chairs to borrow our office chairs!

Keeping the Team Aligned

For the team, we have regular Q&A sessions to keep everyone aligned and informed. We also do regular check-ins in the evening for the team to blow off some steam.

Our bi-weekly Xebia knowledge exchanges have been changed into virtual events. Normally, these XKEs take place from 4PM to 9PM every other Tuesday. The virtual XKE took place from 7PM to 9PM with four half-hour slots. There was even a 30-minute pre-recorded bootcamp to warm-up. 😉
Xebia Knowledge Exchange

Supporting our Clients

Our clients also see the impact of the Coronavirus on their business. Think of online travel agent Booking.com, flower auction Royal FloraHolland, cinema company Pathé, and all those other organizations that are hit by the semi-lockdown in the Netherlands.

Over the past week, we have focused primarily on keeping our clients cloud operations steady. Besides, we have increased our efforts to review and optimize the cost of workloads, brought our training curriculum online, and implemented managed cloud control for clients to support their mission-critical teams.

Keeping clients steady

We continue to help our clients to keep steady in these turbulent times by providing extra flexibility and creativity in the way that we are working with them.

Many of our consultants, much like the teams at our clients, have young families; the extra flexibility to work when they are able to focus, while also appreciating family commitments ensures that the work gets done and both our clients and employees are kept happy!

Cost Optimization

In these times, cost optimization is on the top of everyone’s agenda. We see that in nearly every cloud infrastructure there are things to improve, often leading to thousands of euros in monthly cost savings!

At Binx we are asked to review cloud workloads a lot. For these reviews, we look at the implementation of many aspects, like automation and security and advice on best-practices and remediation.

To support organizations in these trying times, we are conducting free consultation on potential cost reduction and optimization for cloud workloads on Azure, Amazon Web Services, Google Cloud, and Alibaba Cloud. If you are running workloads on the cloud, please feel free to ask us to do a free review on potential cost optimization.

Online Cloud Training

We understand that individuals and teams alike are still looking to develop their skills. For some, reskilling is more important today than before. With physical attendance of classroom training impossible, we have moved our entire curriculum online.

We are offering a complete curriculum of online cloud engineering training courses, offered through Xebia Academy.

Google Cloud Training

Wietse Venema, who is writing an O’Reilly book on Building Serverless applications with Google Cloud Run, is offering an online training for application developers to get going on Google Cloud. This training takes place on Friday, April 10th from 1PM CET until 5PM CET.

Amazon Web Services Training

Furthermore, we have complete and free hands-on learning journeys for AWS available on Instruqt.

Managed Cloud for Organizations

For some organizations, especially the ones with many external consultants, team size has been cut down to the bare minimum. We understand how stressful it can be if you’re the one responsible for mission-critical infrastructure. For those teams, being able to rely on managed cloud support can be more valuable than ever. We help organizations to implement cloud control, providing support and automating all the processes to the extreme.

We believe that especially in these times it is crucial to become as efficient and agile as possible. Adopting a cloud-native way of working contributes heavily to this. We are here to help you get there.

Let’s hope things take a turn for the better again soon. Take care.

With the Chinese economy starting up after Covid-19, let’s hope the economies in other parts of the world will follow soon. With a large part of 2020 still ahead of us, organizations can still cover a lot of ground to realize their ambitions of expanding globally.

Businesses looking to connect with China’s digital landscape often realize the need for locally compliant infrastructure. With a large segment of the Chinese cloud market, Alibaba Cloud might just be the partner you are looking for. And because the company also operates on an international level, with data centers in Europe and the US, working with them might even be beneficial if you do not intend on entering the Chinese market.

When it comes to doing business with China, and reaching the country’s 1.4 billion digital consumers, it is not hard for businesses to understand what a Chinese presence for their product could bring to the table. The promise of near endless market potential looms. But with complex local regulations and a cybersecurity law that is as strict as the GDPR, a lot of businesses are hesitant to step into these largely uncharted waters. This is where working with a Chinese cloud provider can greatly increase your chances of success. And businesses are turning to Alibaba Cloud.

“In the near future, Alibaba Cloud might become one of the bigger players in the cloud market because of their technically solid service portfolio, but also the price advantages they can offer to clients,” says Léon Rodenburg, a consulting software developer at Xebia.

A Chinese language and culture specialist, Rodenburg has led the way in establishing Xebia as Alibaba’s consultancy and training partner in the Benelux and is the first Alibaba Cloud Most Valuable Professional (MVP) in the region.

Why Alibaba Cloud?

“The Hangzhou-headquartered cloud computing company offers a reliable alternative to the dominant players in the sector,” Rodenburg says, “with a stable and extensive product offering, excellent service standards and market-compliant legal processes within Europe, China, and other markets—and all at extremely competitive prices.”

“Whether or not your business has something to do with China, it’s smart to work with—or at least, consider—Chinese cloud providers such as Alibaba. My view is that the big three will become the big four soon,” he says.

At present, the world’s leading cloud players are Google Cloud, Microsoft Azure and AWS from Amazon, but the sheer size of Alibaba Cloud’s operations offers an idea of its capabilities.

“An increasing number of businesses are beginning to look beyond the big three,” he says, “and it’s worth considering the Chinese option.”

A great example is the world’s leading showcase e-commerce event, known as Singles Day, which runs on Alibaba Cloud. Held on November 11 each year, it supports business transactions worth nearly US$31 billion and more than one billion deliveries in a single day. Those volumes are backed by robust cloud services with world-leading security capabilities and protection against common attacks while safeguarding sensitive data and online transactions.

Gateway to China

Alibaba Cloud serves as an essential strategic market bridge for companies that operate within China or trade with the world’s number two economy.

“We’re seeing more and more companies expanding into China, perhaps building a presence there, or looking to sell their services to businesses and consumers within the country,” Rodenburg says.

He explains how Alibaba Cloud serves as a bridge to achieve these goals, in what is referred to as the China Gateway: “That is a collection of services that will help companies enter the Chinese market. One service that Alibaba provides, for example, is ICP filing and registration, which is a type of certificate you need in China to be able to host content under a certain domain name in China. Within China, you’re not allowed to go online as you see fit, but you need to register with government authorities. By doing so, you agree to comply with local laws about online content, for example.”

Travel platforms that want to reach the Chinese market, for instance, benefit from a presence in China. However, with legal obligations to store Chinese user data within the country —similar to GDPR in Europe — companies must rent space or cloud resources within China to comply fully with those laws.

“Alibaba Cloud has a lot of experience with these regulations and certificates because, as a Chinese company, they have to deal with them as well. And they can help you navigate that jungle,” Rodenburg adds.

Multi-Market Solutions

Some companies prefer to use Alibaba Cloud within China while using American or European providers in other parts of the world. This sort of multi-cloud strategy presents a number of challenges in terms of security and governance as well as resiliency issues.

Small to mid-size companies don’t necessarily need that level of complexity, and because it operates both within China and Europe, Alibaba Cloud could serve as a single and better-integrated solution.

Rodenburg explains: "Alibaba Cloud offers the ability to interconnect European data centers with Chinese data centers. This has traditionally been difficult because the public internet in China can be really slow. However, Alibaba and a few other cloud providers in the country have dedicated connections from China to the rest of the world – so, if you need to get data out of China, or if you want to connect to certain backend servers in Europe when an order is placed in a Chinese city, for example, you can do that over these dedicated connections,”

Rodenburg explains. “In such a case, you’re actually renting a piece of cable that runs from China to Europe and potentially the US. You can rely on Alibaba Cloud’s core infrastructure to get stable connections into and from China, which is not possible if you work with non-Chinese cloud providers because they are not allowed to have their own data centers in China."

Alibaba and the Xebia Group are already jointly helping several clients and have initiated several tech community events, training sessions, and China-related business seminars, in order to help them understand how Alibaba Cloud can give them an edge in and outside China. If you are already active in China or thinking of moving a part of your business there, Alibaba Cloud and Xebia are the go-to partners for getting there.

Visual Studio Code is my preferred editor. Unfortunately, it doesn’t work with additional Python source folders out of the box. This blog shows how to add a Python source folder and regain the developer experience you’ve come to love.

Although it’s common to use top-level modules, Python allows you to organize your project any way you want. The src-based module layout uses a src-folder to store the top-level modules.

Basic module layoutSrc-based module layout
/project/module.py/project/src/module.py
/project/tests/test_module.py/project/tests/test_module.py
/project/requirements.txt/project/requirements.txt

To configure Python to search for modules in the src-folder we alter the default search path. In PyCharm this is done by selecting a source folder. In Visual Studio Code, this is done by setting the PYTHONPATH variable.

Add source folder to PYTHONPATH

Modify settings.json to include the source folder “src” in the integrated terminal:

{
    <span style="color:#000080">"terminal.integrated.env.osx"</span>: {
        <span style="color:#000080">"PYTHONPATH"</span>: <span style="color:#b84">"${workspaceFolder}/src"</span>,
    },
    <span style="color:#000080">"terminal.integrated.env.linux"</span>: {
        <span style="color:#000080">"PYTHONPATH"</span>: <span style="color:#b84">"${workspaceFolder}/src"</span>,
    },
    <span style="color:#000080">"terminal.integrated.env.windows"</span>: {
        <span style="color:#000080">"PYTHONPATH"</span>: <span style="color:#b84">"${workspaceFolder}/src"</span>,
    },
    <span style="color:#000080">"python.envFile"</span>: <span style="color:#b84">"${workspaceFolder}/.env"</span>
}

And add or modify .env to include the source folder “src” in the editors’ Python environment:

<span style="color:#008080">PYTHONPATH</span><span style="font-weight:bold">=</span><span style="color:#b84">./src</span>

Note that the PYTHONPATH must be set for both the editors’ Python environment and the integrated terminal. The editors’ Python environment is used by extensions and provides linting and testing functionality. The integrated terminal is used when debugging to activate a new python environment.

Remark This configuration overwrites the existing PYTHONPATH. To extend, use the following settings:

{
    <span style="color:#000080">"terminal.integrated.env.osx"</span>: {
        <span style="color:#000080">"PYTHONPATH"</span>: <span style="color:#b84">"${env:PYTHONPATH}:${workspaceFolder}/src"</span>,
    },
    <span style="color:#000080">"terminal.integrated.env.linux"</span>: {
        <span style="color:#000080">"PYTHONPATH"</span>: <span style="color:#b84">"${env:PYTHONPATH}:${workspaceFolder}/src"</span>,
    },
    <span style="color:#000080">"terminal.integrated.env.windows"</span>: {
        <span style="color:#000080">"PYTHONPATH"</span>: <span style="color:#b84">"${env:PYTHONPATH};${workspaceFolder}/src"</span>,
    }
}
<span style="color:#008080">PYTHONPATH</span><span style="font-weight:bold">=</span><span style="color:#b84">${PYTHONPATH}:./src # Use path separator ';' on Windows.</span>

Resume development

There is no need to reload the workspace. Just open any Python file and enjoy the editors’ capabilities. Please note that it’s safe to include the settings.json file in source control.

If you dislike this additional configuration, feel free to restructure your project. Using the top-level module structure or by creating packages.

We have all been there, testing the Infrastructure-as-Code (IaC) a fellow engineer has written just last week only to discover that our local Terraform version is not compatible with their code. ARRRGGGHHH! It seems that there is an ever increasing stack of software tooling required to run and maintain infrastructure these days. We need cloud vendor CLIs, Ansible, Terraform, Kubectl, and much, much more as the ever expanding list just keeps on growing. Furthermore, version conflicts are found everywhere. Good luck managing two different cloud environments if they were deployed with different versions of the same tooling! That’s why last year I made the decision to stop installing tooling locally and I now run everything in containers; I call these tooling containers, toolboxes.

Every time I start a new project on a cloud provider, I piece together a tooling container that I will use to deploy and configure the infrastructure. I typically build it from a toolbox template stored on the Binx.io GitHub repository; this offers a good starting point as I can take an existing Dockerfile for installing the most common tools. To keep the containers as light as possible I only install the tools I need. So typically this is just the CLI tool of the cloud provider I am using and one or two other things. This Dockerfile can then be pushed to a git repository and distributed to all members of the team to ensure that we are all using the same tooling and versions. From here, the only issue that needs to be taken care of is how to authenticate to the cloud provider.

For authentication, I take a different approach with each cloud provider. Authentication is not a big issue for CI pipelines as you can just code the couple of commands needed and store the secrets as environment variables. However, having to type out the same commands every time you launch the toolbox can get old pretty quickly. In order to make it easier I do one of two things:

  1. Take advantage of the fact that CI tooling typically overwrites the --entrypoint flag for a container so we can use the ENTRYPOINT command in the Dockerfile to script the lines needed for authentication, then pass the secrets to the container as environment variables.
  2. Write a shell script that executes the lines required for logging into the tooling container, then alias the execution of this file so it can be quickly ran.

Of these two methods, I prefer the second as you keep the container generic for others to make use of. It also allows me to take care of the volume mounts without having to add -v $PWD:/home each time. I use the format of clientnameenvironment as the alias for my script so I know exactly which tooling container I am running, and for whom. Bye, bye, authentication issues!

Now that I can easily authenticate my tooling containers to their respective cloud environments. The only thing I have left to worry about is how to manage versions. It has become a cliché in the IT industry at this point, but you should not rely upon containers that use the tag latest. To make it easy for me to manage tooling versions and update the containers I use a GitOps strategy. The strategy is simple:

  1. A standard git flow with branches is used for features and merged to the master branch when approved.
  2. A CI pipeline file is used to build the image.
  3. Each time a push is made to the master branch, the resulting container image is pushed to the container registry with the tag latest.
  4. When a commit has been tagged using git --tag then the resulting container image is pushed to the container registry using both the tag reference and the tag latest.
  5. The changes between tagged versions are documented in a CHANGELOG.md.

So now all I need to do is tell my team that we are using version 0.2.3 of the toolbox and we can all be certain that we, and our CI pipelines, are deploying with the same versions of each software. As the old adage goes, “consistency is key”!

One of the core principles of IAM is the Principle of Least Privilege. The idea is simple: give every role the minimal amount of permissions required to get the job done. However, in practice, this simple task can quickly become a daunting nightmare. To get started quickly, AWS recommends making use of the AWS Managed Policies. However, these policies lack the granularity required to ensure that the Principle of Least Privilege is respected, and often grants additional accesses than those explicitly required for the task. To get a more comprehensive view of what exactly these policies grant, engineers turn towards Access Advisor and Cloud Trail.

Access Advisor is an AWS tool that shows the service permissions granted to a role, and when those services were last accessed. It is extremely useful for determining which services a role uses, and will certainly get you some of the way towards refining access policies to be more restrictive. However, Access Advisor does not show you the permissions granted at resource level. So, you may know that a role has made use of the S3 service; however, you cannot know which bucket or object it has accessed. To get this type of information you will need to go deeper, enter Cloud Trail.

Cloud Trail is a service that allows you to see how roles are interacting with resources at the lowest level. The granularity provided by this service allows you to not only know that the S3 service has been used by a role, but also which buckets and objects the role has interacted with. However, Cloud Trail is not all sunshine and roses. Unfortunately, Cloud Trail outputs all of its log data, which for an enterprise organization can be quite sizeable, as compressed json files stored in an S3 bucket. Although this storage type is cheap and convenient, it does not allow you to easily begin finding the needle in the haystack; what permissions does this role actually require in order to function properly?

To manipulate the output format of Cloud Trail into something you can begin to parse through, you can make use of Amazon Athena. Amazon Athena is a serverless, interactive query service that makes it easy to analyze big data in S3 using standard SQL. As Athena’s is run as a pay per query service where the cost is based on the amount of data scanned, making it the perfect service to parse the logs provided by Cloud Trail on an as-needed basis. The alternative would be to write a custom service for parsing the JSON logs which could be extremely time-consuming. With Athena you can make use of standard SQL queries to find out exactly which services, resources, and objects that a role touches. Now you can finally get down to business!

With the access logs available to be queried using a simple SQL syntax and honed down to this level of granularity, you can begin working towards specifying the minimal access that a role requires. This can be determined using one of the following techniques:

  • Work backwards removing permissions that are not used.
  • Work forwards granting permissions on denials.
  • Automate the process to calibrate the permissions as required.

So now you are in a position to discover the access you actually need, instead of just using the access you want.

Amazon S3 default encryption sets encryption settings for all object uploads, but these settings are not enforced. This may cause unencrypted objects to be uploaded to the bucket. This blog gives you a bucket policy that enforces all object uploads to be encrypted.

(meer…)


Identity-Aware Proxy is a managed service that can control the access to your VM. It allows you to authenticate user TCP traffic through IAP before sending it to your VM instances. And what’s more, this also works for private VM’s without an external IP address. So no need for VPN or a bastion host!

(meer…)

Google Cloud Run is an easy way to deploy your serverless containerized applications. If you need to run your application periodically, you can use Google Cloud Scheduler to do so. In this blog I will show you how to configure a schedule for a serverless application using Terraform.

(meer…)

When working in a multi-account environment with the AWS Cloud Development Kit (CDK), the CDK needs to be able to obtain the appropriate credentials. We are going to show you how to write a credential provider plugin for the CDK.

(meer…)

Using encrypted Amazon machine images from another account in an autoscaling group does not work out of the box. You need to create an explicit KMS grant to make it work. In this blog, I will show you how to configure this in CloudFormation using our custom KMS grant provider in CloudFormation.

(meer…)

In this post, I will walk you through the challenges I’ve faced when adopting the object type in my Terraform 0.12 modules and the solutions I came up with to work around the caveats. As an example, I will use the google_storage_bucket resource, as this is part of one of the modules I’ve built.

The Terraform Object Type

With the launch of Terraform 0.12 back in May 2019, many new cool features are introduced. Compared with Terraform 0.11, where you would find yourself repeating a lot of code, you can now utilise the new for_each functionality and object type to write cleaner code. If you haven’t discovered the new object type yet, you may be surprised by its potential. It is one of the two complex types Terraform provides and gives you the possibility to describe object structures. While this new type has many advantages, such as mixing types and defining multi-layered structures, it also has a few caveats that I will explain further down.

The content below is taken from the Terraform docs itself:
object(...): a collection of named attributes that each have their own type.
The schema for object types is { <key> = <type>, <key> = <type>, ... }</type></key></type></key> — a pair of curly braces containing a comma-separated series of <key> = <type></type></key> pairs.
Values that match the object type must contain all of the specified keys, and the value for each key must match its specified type. (Values with additional keys can still match an object type, but the extra attributes are discarded during type conversion.)

Example object structure:

my_object = {
  a_string  = example,
  a_number  = 1,
  a_boolean = true,
  a_map = {
    type1 = 1,
    type2 = 2,
    type3 = 3
  }
}

Provide Module Default Params with Object

As promised, in this blog post I will explain how I used the object type in my custom Terraform module. In the example below I’ve used the object type to define the supported settings of the GCP storage bucket.

variable bucket_settings {
  type = object({
    location           = string
    storage_class      = string
    versioning_enabled = bool
    bucket_policy_only = bool
    lifecycle_rules = map(object({
      action = map(string)
      condition = object({
        age                   = number
        with_state            = string
        created_before        = string
        matches_storage_class = list(string)
        num_newer_versions    = number
      })
    }))
  })
}

Previously, when writing your Terraform module, you would need to create a variable for each setting you want to pass to your resource. Sure, you would have maps and lists, but a map could only contain values of the same type, limiting its use significantly. When using the object type, we can combine these settings in a complex structure. I can use this to populate my bucket, providing all these settings in a variable (bucket_settings in the listing below).

bucket_settings = {
  location           = europe-west4
  storage_class      = REGIONAL
  versioning_enabled = true
  bucket_policy_only = true
  lifecycle_rules    = {}
}

Since I have certain settings that I want to have applied to all of my buckets, I want to have most of these settings to be set for me by default. To do this, we can provide a default value for the bucket_settings object in the variable definition itself:

variable bucket_settings {
  type = object({
    location           = string
    storage_class      = string
    versioning_enabled = bool
    bucket_policy_only = bool
    lifecycle_rules = map(object({
      action = map(string)
      condition = object({
        age                   = number
        with_state            = string
        created_before        = string
        matches_storage_class = list(string)
        num_newer_versions    = number
      })
    }))
  })

  default = {
    location           = europe-west4
    storage_class      = REGIONAL
    versioning_enabled = true
    bucket_policy_only = true
    lifecycle_rules    = {}
  }
}

Now, if I create my bucket, I can just import the module like this:

module my_bucket {
  source = path.to/my/module
  name   = bucket-with-defaults
}

And since I’ve set the defaults, I don’t need to provide the bucket_settings variable at all. ## Change One Setting, and The Defaults Disappear

But what if I want to overwrite just one of the defaults and keep the rest? In the following example, I try to set a lifecycle policy for the bucket using my module. Notice how I am passing the bucket_settings as a parameter.

module my_bucket {
  source = ./modules/bucket
  name   = bucket-with-defaults

  bucket_settings = {
    lifecycle_rules = {
      delete rule = {
        action = { type = Delete }
        condition = {
          age        = 30
          with_state = ANY
        }
      }
    }
  }
}

Unfortunately, this will result in multiple errors: Missing attributes

The given value is not suitable for child module variable bucket_settings
defined at modules/bucket/main.tf:6,1-27: attributes bucket_policy_only,
location, storage_class, and versioning_enabled are required.

Since I provided the bucket_settings for this module, it will overwrite the entire defaults I’ve set for the variable. I will need to provide the keys like location, storage_class, versioning_enabled and bucket_policy_only. But even after providing these settings, I’m getting a new error:

The given value is not suitable for child module variable bucket_settings
defined at modules/bucket/main.tf:6,1-27: attribute lifecycle_rules: element
delete rule: attribute condition: attributes created_before,
matches_storage_class, and num_newer_versions are required.

For the lifecycle_rule condition, I’ve only provided the age and with_state since I don’t care about created_before, matches_storage_class or num_newer_versions. But since I’ve defined them in the object definition, Terraform will complain and tell me I need to provide them as well. So to make this work with the module setup described above, I will need to provide all of the settings again. As you can see, especially when you want to create multiple buckets with only a few differences in the bucket configuration, this will become quite cumbersome. # Solution, using a defaults variable and merge

Because I only want to provide the settings that differ from the defaults, I’ve developed a solution for this. When I create a separate variable for the defaults, I can merge the provided bucket_settings with this bucket_defaults variable before calling the actual bucket provider. The result looks like this:

Module code snippet

variable name {
  type = string
  description = "The name of the bucket"
}

variable bucket_defaults {
  type = object({
    location           = string
    storage_class      = string
    versioning_enabled = bool
    bucket_policy_only = bool
    lifecycle_rules = map(object({
      action = map(string)
      condition = object({
        age                   = number
        with_state            = string
        created_before        = string
        matches_storage_class = list(string)
        num_newer_versions    = number
      })
    }))
  })

  default = {
    location           = europe-west4
    storage_class      = REGIONAL
    versioning_enabled = true
    bucket_policy_only = true
    lifecycle_rules    = {}
  }
}

variable bucket_settings {
  description = "Map of bucket settings to be applied, which will be merged with the bucket_defaults. Allowed keys are the same as defined for bucket_defaults."
}

locals {
  merged_bucket_settings = merge(var.bucket_defaults, var.bucket_settings)
}

I can now use the module by providing only the lifecycle rule that I want to set for my bucket, and all other defaults will still be applied. Using the module

module my_bucket {
  source = path.to/my/module
  name   = bucket-with-defaults

  bucket_settings = {
    lifecycle_rules = {
    delete rule = {
        action = { type = Delete }
        condition = {
          age          = 30
          with_state = ANY
        }
      }
    }
  }
}

Conclusion

I like the fact that I can now define objects with mixed types to create more complex structures, but there are a couple of things I think that could be improved:

  • The way variable defaults are handled. It would be great if these would be merged with the provided value, or at least have a setting to allow for this behaviour.
  • You should be able to define optional keys for your objects. However, the need for this will become less urgent when the first point is taken care of.

Hopefully, this blog post will help you understand the object type and its limitations while giving you an idea of utilising it to its best potential. In my next blog post, I will tell you more about the for_each functionality and how to use this in your Terraform modules.

As part of the Data lake I’m working on Sagemaker instances for Data Analysts to run analytics jobs.
These instances are required on demand, in fact when Analysts are crunching or exploring data.
One solution could be to create these instances upfront, at the very start of the job, and clean up the instances once the job has been completed.
Yet these instances incur costs, even when stopped thru EBS volumes. Also, OS patches aren’t applied in a stopped state.

We want the Analysts be able to create those instances, as big as they need them, when they need them.
Not completely by free clicking around of course. The Service Catalog in the AWS Console provides just the capability to do this properly.
And so, allow the users, the Data Analysts, to interact with the resources they need.
I provided full guidance to the Analysts by describing the infra and properties required by Sagemaker.
By storing the settings in a script I allowed the Analysts in a specific role to start up an instance with one single click.
This is exactly where AWS Service Catalog comes in place, running that stored CloudFormation script shipped as CloudFormation Product.

Portfolio, Product & Portfolio Product Association

The following snippets describe how I made the Sagemaker part in the Data lake work. The Cloudformation is stored in a seperate S3 Bucket. After creating user wil be able to launch the stack.

DatalakePortfolio:
  Type: "AWS::ServiceCatalog::Portfolio"
  Properties:
    content: "A portfolio of self-service Datalake."
    DisplayName: "Datalake Portfolio"
    ProviderName: "Datalake"

SagemakerProduct:
  Type: AWS::ServiceCatalog::CloudFormationProduct
  Properties:
    content: "Sagemaker Product"
    Distributor: "Datalake"
    Name: "Sagemaker"
    Owner: "binx"
    ProvisioningArtifactParameters:
    - content: "Initial version"
      DisableTemplateValidation: False
      Info:
        LoadTemplateFromURL: "https://s3.amazonaws.com/<>/products/sagemaker-instance.yml"
      Name: "v1"
    SupportEmail: "datalake@binx.nl"
    SupportUrl: "https://confluence.binx.nl/display/DLAKE/Service+Catalog"

SagemakerPortfolioProductAssociation:
  Type: "AWS::ServiceCatalog::PortfolioProductAssociation"
  Properties:
    PortfolioId:
      Ref: DatalakePortfolio
    ProductId:
      Ref: SagemakerProduct

Portfolio Principal Association

Attach the Portfolio to the previously created Principal. This can be an IAM User, Group or Role.
In our case it is bound to SAML authenticated IAM role that a Data analyst gets.

DatalakePortfolioPrincipalPUAssociation:
  Type: "AWS::ServiceCatalog::PortfolioPrincipalAssociation"
  Properties:
    PortfolioId:
      Ref: DatalakePortfolio
    PrincipalARN: "${PrincipalARN}"
    PrincipalType: "IAM"

Portfolio Constraint

To Launch and Tear Down the product we need a Role for Service Catalog to assume.
This role has all the policies that are needed to create and remove the Product.
With the LaunchRoleConstraint we bound a Product/Portfolio combination to a Role.

SagemakerLaunchRoleConstraint:
  Type: AWS::ServiceCatalog::LaunchRoleConstraint
  Properties:
    content: "Constraint to run Sagemaker and S3 in Cloudformation."
    PortfolioId:
      Ref: DatalakePortfolio
    ProductId:
      Ref: SagemakerProduct
    RoleArn:
      Fn::GetAtt: [ LaunchConstraintRole, Arn ]
  DependsOn: [ DatalakePortfolioPrincipalAssociation, LaunchConstraintRole ]

LaunchConstraintRole:
  Type: “AWS::IAM::Role”
  Properties:
    Path: “/“
    AssumeRolePolicyDocument:
      Version: “2012-10-17”
      Statement:
        - Effect: “Allow”
          Principal:
            Service: “servicecatalog.amazonaws.com”
          Action: “sts:AssumeRole”
    Policies:
      - PolicyName: “AllowProductLaunch”
        PolicyDocument:
          Version: 2012-10-17
          Statement:
            - Resource: ‘*’
              Effect: “Allow”
              Action:
                # Permissions required for the provisioning of the Sagemaker
                - cloudformation:GetTemplateSummary
                - s3:GetObject
                - sagemaker:*
                - s3:*
                - iam:Get*
                - iam:PassRole
                - ec2:*
                - kms:*
            - Resource:
                - “arn:aws:iam::*:role/SC-*”
                - “arn:aws:sts::*:assumed-role/datalake-service-catalog-LaunchConstraintRole-*”
                - “arn:aws:iam::*:role/sagemaker-notebook-iam-role”
              Effect: “Allow”
              Action:
                - iam:*
            - Resource:
                - “arn:aws:cloudformation:*:*:stack/SC-*”
                - “arn:aws:cloudformation:*:*:changeSet/SC-*”
              Effect: “Allow”
              Action:
                # Permissions required by AWS Service Catalog to create stack
                - cloudformation:CreateStack
                - cloudformation:DeleteStack
                - cloudformation:DescribeStackEvents
                - cloudformation:DescribeStacks
                - cloudformation:SetStackPolicy
                - cloudformation:ValidateTemplate
                - cloudformation:UpdateStack

Please use this link to download the full scripts to hit the ground running.
Make sure you check out IAM policies before using for real.

Conclusion

Cloud infrastructure should be in code. We all know that.
The smallest manual input tends to break a lot and makes things unpredictable. We have all dealt with that.
But sometimes the AWS Console provides a nice way for users to interact with Cloud resources.
The use of the AWS Service Catalog allows us to create resources on demand but still have them grouped in a defined, structured way.
Thus, allowing us to provide users freedom to work when they want to, and with a variety of products. And at the same time, savings costs.

Of all the sessions I attended at re:Invent 2019, the one I found most interesting was about the Security Benefits of the Nitro architecture.

Although it might sound weird to hear a Cloud Consultant discuss hardware details, what makes it worth going down to this level is, that the Nitro architecture showcases in what ways AWS is innovating in the field of virtualization. I guess that because of this, Werner himself highlighted these features in his keynote.

Privileged Virtual Machines

The Xen hypervisor, which AWS originally used and that still powers some EC2 instance classes, has the concept of a dom0. Each host has one privileged virtual machine, dom0, that is allowed to interact with real hardware; this allows the hypervisor to remain small by offloading work to dom0.

This privileged virtual machine runs a Linux operating system that manages (disk and network) devices and exposes them to the non-privileged guest virtual machines. In dom0 AWS runs, among others, services supporting EBS, VPC networking, and CloudWatch.

The privileged dom0 virtual machine has a large attack surface as it shares hardware with the guest virtual machines and runs a fully-fledged operating system.

The Nitro System

What AWS calls the Nitro system is a collection of custom build devices that take most of the work that normally happens in dom0 to support the virtual machines. Not only does offloading this work to the Nitro system leave more capacity for the guests (about 10% of EC2 host resources are regained), it also makes everything much more secure.

It consists of a couple of PCI-card like devices, in addition to this a special security chip is embedded on the motherboard. While implementing the Nitro system, AWS paid attention to a new security paradigm.

Security Benefits of the Nitro Architecture

(Image taken from the re:Inforce 2019 slides of this talk.)

New Security Paradigm

When building the Nitro system, AWS paid special attention to the following security precautions:

  • where the Nitro components run an operating system, remote login (SSH) is disabled
  • the Nitro components are interconnected using a private network
  • only the Nitro components are allowed connect to the AWS network, direct access to this network is not allowed from the bare-metal and guest virtual machines
  • the Nitro components and hypervisor are controlled by sending requests to their APIs, never will these components initiate outgoing traffic themselves; outgoing traffic is a sign of trouble
  • the Nitro components are live-updatable, with barely noticeable impact for the virtual machines

PCI-card like Nitro System Devices

The functionality of the Nitro system is provided by various PCI-card like devices. This design is inspired by microservice software architectures.

Some types of devices include:

  • Nitro Controller
  • Nitro EBS subsystem
  • Nitro Ephemeral storage subsystem
  • Nitro VPC subsystem

Nitro Controller

The Nitro controller manages the host, hypervisor, and the other Nitro devices in the system.

The Nitro controller:

  • validates the motherboard’s firmware(s) against known-good cryptographic hashes
  • manages the (de)provisioning of EBS and Local Storage volumes and ENI devices
  • manages the hypervisor to (de)provision virtual machines
  • manages the hypervisor to attach/detach block and network devices to/from virtual machines
  • gathers metrics for CloudWatch

Nitro EBS subsystem

The Nitro EBS subsystem provides EBS volumes as NVMe devices to the motherboard.

All encryption happens on the Nitro device, encryption keys are managed by KMS, encryption keys can not be accessed by the EC2 host.

Nitro Ephemeral Storage subsystem

The Nitro local storage subsystem provides ephemeral volumes as NVMe devices to the motherboard.

All encryption happens on the Nitro device, encryption keys are generated on, and never leave, the device.

After use, the volumes are cryptographically wiped, i.e., the keys are removed, the actual data in encrypted form remains on the physical disks. Before use, the first and last bytes of the volumes are zeroed to prevent confusing the guest operating system (as the device would still be filled with “random” data from previous uses).

Nitro VPC subsystem

The Nitro VPC subsystem provides network interface controller devices to the motherboard.

The VPC stack runs on the Nitro system; only the Nitro system has access to the private AWS network, the EC2 host and guests can only access the network via the Nitro system.

All traffic between Nitro powered instances is transparently encrypted on the Nitro system, traffic to non-Nitro instances is not encrypted as this would impact the performance.

Security Chip Embedded on the Motherboard

The Nitro security chip guards the firmware of the motherboard in two ways:

  • it denies any updates of the motherboard firmware
  • it pauses the motherboard boot process at an early stage to allow the firmware to be cryptographically checked by the Nitro Controller

What We Learned from this re:Invent Session

The Security Benefits of the Nitro architecture session taught us:

  • about 10% of host performance is regained by offloading work to the Nitro system
  • the Nitro system is modelled after microservice architectures
  • the Nitro system improves security by isolating management and encryption from the guest machines
  • the Nitro system improves security by adopting a new paradigm

When you look at Google Cloud services like Source Repository and Cloud Build, you would think it is very easy to create a CI/CD build pipeline. I can tell you: it is! In this blog I will show you how to create a serverless CI/CD pipeline for a Docker image, using three resources in Terraform.

(meer…)