Private Connection

By default, services are connected to each other via public IPs. In most cases, it’s advised to use private IPs instead. Private IP addresses let devices connect within the same network, without the need to connect to the public internet. This offers an additional layer of protection, making it more difficult for an external host or user to establish a connection.

The problem

The Private IP access pattern has been build with Infrastructure as a Service (IaaS) in mind (i.e. virtual machines, VPC, etc). This means that this isn’t so straightforward to implement if you’re using Serverless services.

Examples of Serverless compute services within Google Cloud are:

  • App Engine Standard Environment
  • Cloud Functions
  • Cloud Run

The solution

To solve this problem, Google released a network component that is called Serverless VPC Access. This connector makes it possible for you to connect directly to your VPC network from Serverless environments.

We are using the Serverless VPC Access Connector to create a connection to a CloudSQL database via Private IP. Please note that this is not limited to CloudSQL. Once you have setup the Serverless VPC Access Connector, it is possible to connect to any resources that are available within your VPC.

In this blog post, I’ll guide you in setting this up for Google App Engine Standard Environment.

Prerequisites:

  • Authenticated Google Cloud SDK, alternatively Cloud Shell.
  • Enough GCP permissions to create networks and perform deployments.

Step 1: Create an VPC with networks

For the purpose of this blog post, I’m going to create a new VPC with a subnet in europe-west1. Please note that this is not required. You can also reuse your own VPC or the Google Provided Default VPC.

gcloud compute networks create private-cloud-sql \
--subnet-mode custom

In Google Cloud, a VPC is global, but we still need to create subnets to deploy resources in the different Cloud region.

This command creates an subnet in the VPC we created earlier for europe-west1:

gcloud compute networks subnets create private-europe-west1 \
--description=europe-west1\ subnet \
--range=192.168.1.0/24 \
--network=private-cloud-sql \
--region=europe-west1

Step 2: Create a Serverless VPC Access Connector

After we’ve created a VPC with a subnet, we can continue by creating a Serverless VPC Access Connector. We can use the following GCloud command to do this.

gcloud compute networks vpc-access connectors create connector-europe-west1 \
  --network=private-cloud-sql \
  --region=europe-west1 \
  --range=10.8.0.0/28

Note: This command uses some defaults for the number of instances as well as the instance type. Since this could limit your network throughput between your VPC and the Serverless products, it’s recommended to override these properties.

Step 3: Setup Private Access Connection

The CloudSQL instances are deployed in a VPC that Google Manages. To use Cloud SQL instances via private IP in your VPC, we need to setup a VPC peering connection with the ‘servicenetworking’ network.

If this is the first time you interact with service networking, you have to enable the API via:

gcloud services enable servicenetworking.googleapis.com

To setup a VPC Peering connection, we first need to reserve an IP range that can be used. We can do this by executing the following command:

gcloud compute addresses create google-managed-services-private-cloud-sql \
--global \
--purpose=VPC_PEERING \
--prefix-length=16 \
--network=private-cloud-sql

This sets up an auto-generated IP address range based on a prefix, it’s also possible to specify the range yourself.

Now that we have an IP range, we can use this to setup a VPC peering connecting with the ‘service networking’ project:

gcloud services vpc-peerings connect \
--service=servicenetworking.googleapis.com \
--ranges=google-managed-services-private-cloud-sql \
--network=private-cloud-sql

Note: This step is specifically for connecting with Cloud SQL over private IP. It can be skipped if you are not connecting with CloudSQL.

Step 4: Create a Cloud SQL Instance

Now that we created the necessary network infrastructure, we are ready to create a CloudSQL Database instance.

Using this command, we create a new PostgreSQL database with private ip, in the VPC network we provisioned earlier. In this example, I’ve used ‘secretpassword’ as the root password. For production workloads, I’d recommend using Google Secret Manager or IAM authentication.

gcloud beta sql instances create private-postgres \
--region=europe-west1 \
--root-password=secretpassword \
--database-version=POSTGRES_13 \
--no-assign-ip \
--network=private-cloud-sql \
--cpu=2 \
--memory=4GB \
--async

This operation takes a few minutes. After that, a database can be created in the newly provisioned CloudSQL instance.

gcloud sql databases create test --instance private-postgres

Step 5: Configure your App Engine Application

Now that we created the correct networking configuration and the (PostgreSQL) Database has been created, it’s time to update the configuration of our App Engine application.

A hello world application can be cloned here.

To update the configuration of an App Engine Application, we need to change the app.yaml file. This file is located in the root of the project.

The result could look as follows (don’t forget to replace the values for the VPC connector and the database):

runtime: nodejs14

vpc_access_connector:
 name: projects/martijnsgcpproject/locations/europe-west1/connectors/connector-europe-west1
 egress_setting: all-traffic

env_variables:
  PGHOST: 10.39.16.5
  PGUSER: postgres
  PGDATABASE: test
  PGPASSWORD: secretpassword
  PGPORT: 5432

In this example, the database connection information is provided via environment variables. Depending on the library/runtime that you’re using, this may needs to be changed. You provide the username, database, and passwordd settings used in step 4. To get the private IP address reserved for your CloudSQL, you can navigate to the Google Cloud Console and go to the Cloud SQL page.

Step 6: Deploy your App Engine Application

Now we’ve configured our App Engine specification, we can deploy it. Navigate to the root directory and execute:

gcloud app deploy

This will package your app, upload it to GCP and deploy it to the App Engine Service.

After the app has been deployed, you can automatically navigate to the application by executing:

gcloud app browse

The hello world application only has one endpoint. Once triggered, it sets up a connection with the CloudSQL database and executes a SELECT NOW() query.

Once you navigate to the generated link via the gcloud app browse command, you’ll see:

Congrats! Your connection from Google App Engine to a private CloudSQL instance is successful!

Bonus: Cloud Run

While this blog post is about Google App Engine, it’s also possible to use the Serverless VPC Connector with Cloud Run, via:

gcloud run deploy SERVICE_NAME \
--image IMAGE_URL \
--vpc-connector CONNECTOR_NAME

The Service Account that Cloud Run is using must have the following the roles/vpcaccess.serviceAgent role.

Bonus 2: Cloud Functions

Similar to Cloud Run, you can setup a Cloud Function with a Serverless VPC Connector via:

gcloud functions deploy FUNCTION_NAME \
--vpc-connector CONNECTOR_NAME 

The Service Account that Cloud Function is using must have the following the roles/vpcaccess.serviceAgent role.

Conclusion

We’ve just learned how to combine Google Cloud’s Serverless products with private IP instances via the Serverless VPC Access Connector.

The diagram below illustrates everything we have created in this blog post.

This blog shows you how to deploy PrivateBin on Google Cloud Run and Google Cloud Storage, to achieve a cheap and high available deployment for low intensity usage.

(more…)

To use Google Cloud resources you have to enable the corresponding service API. This is forgotton, easily. Therefore, you’ve likely seen the ‘[service] API has not been used in project [project_number]..’ or ‘Consumer [project_number] should enable ..’ errors, and wondered how to find your project using a project number. Gladly, you can use the gcloud CLI for that.

Gcloud Projects Describe [Project Number]

To find your project, don’t use the cloud console (!), use the CLI:

gcloud projects describe [project_number]

And immediately, you receive all attributes required to update your infrastructure code: project id and name.

createTime: '2019-10-21T12:43:08.071Z'
lifecycleState: ACTIVE
name: your-project-name
parent:
  id: 'your-organization-id'
  type: organization
projectId: your-project-id
projectNumber: '000000000000'

Cloud Console Alternative

The project number is quite hidden in the Cloud Console. You might expect it in the Resource manager view. Sadly, at the time of writing, it’s only available in the Cloud Console Dashboard.

Please, however, don’t browse through all your projects. Save time and trouble by using the following url: https://console.cloud.google.com/home/dashboard?project=[project_number]. This will open the right project for you as well.

Conclusion

Finding project numbers is fast and easy with the gcloud CLI. Use the saved time and energy to resolve actual errors.

Photo by Thijs van der Weide from Pexels

Hilversum, the Netherlands, 1 September 2021 – Xebia acquires g-company and solidifies its position as a leading Google Cloud Partner.

Over the past three years, Binx.io has developed itself as the GCP boutique within the Xebia organization, holding a Google Cloud Premier Partnership and Authorized Training Partnership.

With the acquisition of g-company, Xebia increases its GCP capabilities and expands its services with solutions like Google Workspace, enabling it to facilitate digital transformations even better.

In 2007, when Google introduced its first business solution, David Saris launched g-company. As Google’s first partner in Northern Europe, g-company has grown alongside the IT multinational, with over 70 people now working from its offices in Malaysia, Belgium, and the Netherlands. g-company partners with Google, Salesforce, Freshworks, monday.com, and many other web-based solutions to offer application development, data, machine learning, modern infrastructure, and online workplace services to companies across the globe.

“To accelerate the growth of our GCP practice, there was one company that immediately came to mind: g-company. So I a while ago I reached out to David to discuss this opportunity, I am excited to announce today that we have joined forces to align our GCP activities!" – Bart Verlaat, CEO Binx.io.

Xebia is a fast-growing IT consultancy organization that houses multiple brands, each one covering a digital domain, such as cloud, data, AI, Agile, DevOps, and software consulting. Xebia strives to be a leading global player in training and consultancy, including Google Cloud and Workspace. With g-company, Xebia can grow by adding additional services to its one-stop shop for businesses worldwide.

"I’m very proud of what we’ve achieved over the last fourteen years, but as far as I’m concerned, the biggest growth is yet to come. So many workloads still need to move to the cloud. Google is a key player in this, but I believe in freedom of choice. With Xebia, we can really offer our customers that." – David Saris, Founder & CEO g-company.

"Like Xebia, we too believe in focus. With so much expertise, we can now offer our customers a one-stop-shop experience. From cloud engineering and managed services to machine learning, multi-cloud, data, and AI, cloud-native software development and security." – Lennart Benoot, Managing Partner g-company.

"Xebia wants to make it as easy as possible for organizations to get everything out of Google’s business products. As a Premier Google Cloud partner, we work closely with various Google product specialists. g-company offers a team of experienced specialists and adds in-depth knowledge of and experience with Google Workspace, Salesforce, Freshworks, monday.com and Lumapps to our portfolio." – Andrew de la Haije, CEO Xebia.

After the acquisition, g-company and Xebia will remain a Google Cloud Premier Partner and Authorized Google Cloud Training Partner, including a shared Infrastructure and Workspace Competency. g-company will continue to operate under its own label and the existing management of David Saris and Lennart Benoot. The acquisition will be signified by adding “proudly part of Xebia” to the company logo.

It is not possible to remember all methods and properties from all classes that you write and use.

That is the reason why I like to have type hinting in my IDE. In fact I will spend time to make sure it works. I even wrote a blog post about how to set it up when I started experimenting with a new language.

In this post I will provide you with a couple of tips that help you develop faster. By leveraging type hinting using the AWS Lambda Powertools

LambdaContext

When you are developing AWS Lambda functions you receive an event and a context property. If you need metadata from the context you could read the documentation or use type hinting.

LambdaContext type hinting example

As you can see in the example, the IDE will help you find the correct method or property. But what about the event? The example tells you that we expect a Dict[str, Any] but this depends on how you invoke the Lambda function.

Event Source Data Classes

Most of the time you know how you invoke your Lambda function. For example an API Gateway proxy integration could trigger the function:

API Gateway proxy integration type hinting example

Next to using the data class you could also use a decorator. Then use the type hint in the method signature.

from aws_lambda_powertools.utilities.data_classes import event_source, APIGatewayProxyEvent

@event_source(data_class=APIGatewayProxyEvent)
def lambda_handler(event: APIGatewayProxyEvent, context):
    if 'helloworld' in event.path and event.http_method == 'GET':
        do_something_with(event.body, user)

For a full list see the supported event sources page.

What about my custom events?

You could also invoke a Lambda function with a custom payload. This event is not supported as a data class.

Let say we receive an order with one or more items. By using the parser you can define the event in a model. And you can use that model for type hinting.

from typing import Optional
from aws_lambda_powertools.utilities.parser import event_parser, BaseModel, ValidationError
from aws_lambda_powertools.utilities.typing import LambdaContext

import json

class OrderItem(BaseModel):
    id: int
    quantity: int
    description: str

class Order(BaseModel):
    id: int
    description: str
    items: List[OrderItem] # use nesting models
    optional_field: Optional[str] # this field may or may not be available when parsing

@event_parser(model=Order)
def handler(event: Order, context: LambdaContext):
    print(event.id)
    print(event.description)
    print(event.items)

    order_items = [items for item in event.items]
    ...

But my custom event is in an existing event source

Well in that case you need to supply the event source as an envelope.

from aws_lambda_powertools.utilities.parser import event_parser, parse, BaseModel, envelopes
from aws_lambda_powertools.utilities.typing import LambdaContext

class UserModel(BaseModel):
    username: str
    password1: str
    password2: str

# Same behavior but using our decorator
@event_parser(model=UserModel, envelope=envelopes.EventBridgeEnvelope)
def handler(event: UserModel, context: LambdaContext):
    assert event.password1 == event.password2

Conclusion

If you found this blog useful I recommend reading the official documentation. It’s a small investment that will pay out in nice and clean code.

No need to remember all the methods and properties and what types the expect and return. An IDE can do that for you so that you can focus on the actual business logic.

We have read and selected a few articles and other resources about the latest developments around cloud technology so you don’t have to. Read further and keep yourself up-to-date in five minutes!

Why is there no June 2021 edition?

You probably noticed that we skipped the June 2021 edition. We have been quite busy working on other awesome stuff, for example the Google Cloud Platform User Group Benelux and the “Designing Serverless Applications on AWS” talk for the End2End Live Conference. Watch the recording below!

Where should I run my stuff? Choosing a Google Cloud compute option

Where should you run your workload? It depends…Choosing the right infrastructure options to run your application is critical, both for the success of your application and for the team that is managing and developing it. This post breaks down some of the most important factors that you need to consider when deciding where you should run your stuff!

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/where-should-i-run-my-stuff-choosing-google-cloud-compute-option

Lambda Extensions

The execution environment of a Lambda gets reused when it is executed multiple times. Usually code ignores this fact but a very common optimization is to load settings only the first time your code runs in the execution environment. The extension API, that is available for extension writers, now gives standardized access to life-cycle information of the execution environment. There are many ready-to-use extensions from AWS partners and open source projects, but you can even write your own extensions. An extension is usually just a code “layer” you can add to your Lambda.

Read more:
https://aws.amazon.com/about-aws/whats-new/2021/05/aws-lambda-extensions-now-generally-available/

The ultimate App Engine cheat sheet

App Engine is a fully managed serverless compute option in Google Cloud that you can use to build and deploy low-latency, highly scalable applications. App Engine makes it easy to host and run your applications. It scales them from zero to planet scale without you having to manage infrastructure. App Engine is recommended for a wide variety of applications including web traffic that requires low-latency responses, web frameworks that support routes, HTTP methods, and APIs.

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/ultimate-app-engine-cheat-sheet

StepFunctions now supports EventBridge

Before you had to go through something like Lambda to generate EventBridge events from a StepFunction. Now support has been added to the StepFunctions implementation to do this more efficiently without the need of extra code or infrastructure. This simplifies this kind of serverless solutions greatly!

Read more:
https://aws.amazon.com/about-aws/whats-new/2021/05/aws-step-functions-now-supports-amazon-custom-events-eventbridge/

Maximize your Cloud Run investments with new committed use discounts

One of the key benefits of Cloud Run is that it lets you pay only for what you use, down to 100 millisecond granularity. This is ideal for elastic workloads, notably workloads that can scale to zero or that need instant scaling. However, the traffic on your website or API doesn’t always need this kind of elasticity. Often, customers have a steady stream of requests, or the same daily traffic pattern, resulting in a predictable spend for your Cloud Run resources. Google Cloud is now introducing self-service spend-based committed use discounts for Cloud Run, which let you commit for a year to spending a certain amount on Cloud Run and benefiting from a 17% discount on the amount you committed.

Read more:
https://cloud.google.com/blog/products/serverless/introducing-committed-use-discounts-for-cloud-run

EventBridge supports sharing events between event buses

Using EventBridge is a good way to create a decoupled event-driven architecture. Using routing rules you can now route events to other buses in the same account and region. Using this feature you can either fan out events to different event buses or aggregate events to a single bus.

Read more:
https://aws.amazon.com/about-aws/whats-new/2021/05/amazon-eventbridge-supports-sharing-events-between-event-buses-same-account-region/

Credits: Header image by Brianna Santellan on Unsplash.com

Setup Visual Studio Code as your IDE to build and deploy an AWS SAM stack using Go.

I recently started to experiment with Go and the best way to learn is to play with it and build something. So I decided to build a simple application. During this process I ran into the problem that my IDE did not like having many go modules within one project.

Project layout

With AWS SAM you can build and deploy your stack or via a CI/CD pipeline, the process is the same. First you perform a sam build. This will compile the source code for each AWS::Serverless::Function in the template.yaml. Followed by sam deploy. This will upload the artifacts and deploy the template using AWS CloudFormation.

The problem that I faced was that each function needs a go.mod in the source folder. Visual Studio Code expects a single go.mod file in the project.

Project Structure

As a result imports in the unit test do not get recognized:

Sample code of the unit test without workspace
  • The RequestEvent struct.
  • The handler method.
  • The ResponseEvent struct.

The reason for this is that the IDE has set the GOPATH to the root of the project. Therefor the imports are not correct. By using Workspace Folders you can change this behavior so that each folder has it’s own scope.

From Go 1.16+ the default build mode has switched from GOPATH to Go modules.

To do this you need to create a workspace.code-workspace file. Place it in the root of your project with the following content:

{
    "folders": [
        {
            "path": "functions/api"
        },
        {
            "path": "functions/confirm"
        },
        {
            "path": "functions/ship"
        },
        {
            "path": "functions/stock"
        },
        {
            "path": "."
        }
    ]
}

Now because each Go module lives within it’s own scope the imports are correct. As a result you are able to navigate through the code using the ⌘ click action.

Conclusion

When you deal with many Go modules within one project it is important to configure your IDE. This will improve readability, reduces errors and increases your development speed!

So you want to know "How to Read Firestore Events with Cloud Functions and Golang" ? You’re in the right place! I recently worked on a side project called "Syn" (https://github.com/lucavallin/syn – Old Norse for vision!) which aims at visually monitoring the environment using the Raspberry Pi, Google Cloud and React Native. The Raspberry Pi uses a tool called motion, which takes pictures (with the Pi camera) when movement is detected. The pictures are then uploaded to Cloud Storage, and a Cloud Function listens for events that are triggered when a new object is uploaded to the bucket. When a new picture is uploaded, the function tries to label the image using Vision API and stores the result in Firestore. Firestore triggers an event itself, which I am listening for in a different Cloud Function, which uses IFTTT to notify me when movement has been detected and what Vision API has found in the image. The goal of this blog post is to explain how to parse Firestore events, which are delivered to the function in a rather confusing format!

Shape and colour of the Event struct

The Cloud Function processing new events (= new uploads from the Raspberry Pi, when movement is detected), after labelling the picture with Vision API, creates a new Event object and stores it in the Events collection in Firestore.
The Event struct looks like this:

type Event struct {
    URI     string    `json:"uri" firestore:"uri"`
    Created time.Time `json:"created" firestore:"created"`
    Labels  []Label   `json:"labels" firestore:"labels"`
}

The struct also references an array of Labels. Label is a struct defined as:

type Label struct {
    Description string  `json:"description" firestore:"description"`
    Score       float32 `json:"score" firestore:"score"`
}

This is the result once the information has been persisted to Firestore:

Syn - Firestore

Create a function to listen to Firestore events

Another function, called Notify, listens for events from Firestore (and then notifies the user via IFTTT), which are triggered when new data is added into the database. I have used Terraform to setup the function:

resource "google_cloudfunctions_function" "notify" {
  project               = data.google_project.this.project_id
  region                = "europe-west1"
  name                  = "Notify"
  description           = "Notifies of newly labeled uploads"
  service_account_email = google_service_account.functions.email
  runtime               = "go113"
  ingress_settings      = "ALLOW_INTERNAL_ONLY"
  available_memory_mb   = 128

  entry_point = "Notify"

  source_repository {
    url = "https://source.developers.google.com/projects/${data.google_project.this.project_id}/repos/syn/moveable-aliases/master/paths/functions"
  }

  event_trigger {
    event_type = "providers/cloud.firestore/eventTypes/document.create"
    resource   = "Events/{ids}"
  }

  environment_variables = {
    "IFTTT_WEBHOOK_URL" : var.ifttt_webhook_url
  }
}

The event_trigger block defines the event that the function should listen for. In this case, I am listening for providers/cloud.firestore/eventTypes/document.create events in the Events collection.

What does a "raw" Firestore event look like?

Using fmt.Printf("%+v", event), we can see that the Firestore event object looks like this:

{OldValue:{CreateTime:0001-01-01 00:00:00 +0000 UTC Fields:{Created:{TimestampValue:0001-01-01 00:00:00 +0000 UTC} File:{MapValue:{Fields:{Bucket:{StringValue:} Name:{StringValue:}}}} Labels:{ArrayValue:{Values:[]}}} Name: UpdateTime:0001-01-01 00:00:00 +0000 UTC} Value:{CreateTime:2021-07-27 09:22:03.654255 +0000 UTC Fields:{Created:{TimestampValue:2021-07-27 09:22:01.4 +0000 UTC} File:{MapValue:{Fields:{Bucket:{StringValue:} Name:{StringValue:}}}} Labels:{ArrayValue:{Values:[{MapValue:{Fields:{Description:{StringValue:cat} Score:{DoubleValue:0.8764283061027527}}}} {MapValue:{Fields:{Description:{StringValue:carnivore} Score:{DoubleValue:0.8687784671783447}}}} {MapValue:{Fields:{Description:{StringValue:asphalt} Score:{DoubleValue:0.8434737920761108}}}} {MapValue:{Fields:{Description:{StringValue:felidae} Score:{DoubleValue:0.8221824765205383}}}} {MapValue:{Fields:{Description:{StringValue:road surface} Score:{DoubleValue:0.807261049747467}}}}]}}} Name:projects/cvln-syn/databases/(default)/documents/Events/tVhYbIZBQypHtHzDUabq UpdateTime:2021-07-27 09:22:03.654255 +0000 UTC} UpdateMask:{FieldPaths:[]}}

…which is extremely confusing! I was expecting the event to look exactly like the data I previously stored in the database, but for some reason, this is what a Firebase event looks like. Luckily, the "JSON to Go Struct" IntelliJ IDEA plugin helps making sense of the above:

type FirestoreUpload struct {
    Created struct {
        TimestampValue time.Time `json:"timestampValue"`
    } `json:"created"`
    File struct {
        MapValue struct {
            Fields struct {
                Bucket struct {
                    StringValue string `json:"stringValue"`
                } `json:"bucket"`
                Name struct {
                    StringValue string `json:"stringValue"`
                } `json:"name"`
            } `json:"fields"`
        } `json:"mapValue"`
    } `json:"file"`
    Labels struct {
        ArrayValue struct {
            Values []struct {
                MapValue struct {
                    Fields struct {
                        Description struct {
                            StringValue string `json:"stringValue"`
                        } `json:"description"`
                        Score struct {
                            DoubleValue float64 `json:"doubleValue"`
                        } `json:"score"`
                    } `json:"fields"`
                } `json:"mapValue"`
            } `json:"values"`
        } `json:"arrayValue"`
    } `json:"labels"`
}

While still confusing, at least now I can split up the struct so I can reference the types correctly elsewhere in the application should I need to, for example, loop through the labels.

Cleaning up the event structure

The FirestoreUpload can be split up in order to have named fields rather than anonymous structs. This is useful to be able to reference the correct fields and types elsewhere in the application, for example when looping through the labels.

package events

import (
    "github.com/thoas/go-funk"
    "time"
)

//FirestoreEvent is the payload of a Firestore event
type FirestoreEvent struct {
    OldValue   FirestoreValue `json:"oldValue"`
    Value      FirestoreValue `json:"value"`
    UpdateMask struct {
        FieldPaths []string `json:"fieldPaths"`
    } `json:"updateMask"`
}

// FirestoreValue holds Firestore fields
type FirestoreValue struct {
    CreateTime time.Time       `json:"createTime"`
    Fields     FirestoreUpload `json:"fields"`
    Name       string          `json:"name"`
    UpdateTime time.Time       `json:"updateTime"`
}

// FirestoreUpload represents a Firebase event of a new record in the Upload collection
type FirestoreUpload struct {
    Created Created `json:"created"`
    File    File    `json:"file"`
    Labels  Labels  `json:"labels"`
}

type Created struct {
    TimestampValue time.Time `json:"timestampValue"`
}

type File struct {
    MapValue FileMapValue `json:"mapValue"`
}

type FileMapValue struct {
    Fields FileFields `json:"fields"`
}

type FileFields struct {
    Bucket StringValue `json:"bucket"`
    Name   StringValue `json:"name"`
}

type Labels struct {
    ArrayValue LabelArrayValue `json:"arrayValue"`
}

type LabelArrayValue struct {
    Values []LabelValues `json:"values"`
}

type LabelValues struct {
    MapValue LabelsMapValue `json:"mapValue"`
}

type LabelsMapValue struct {
    Fields LabelFields `json:"fields"`
}

type LabelFields struct {
    Description StringValue `json:"description"`
    Score       DoubleValue `json:"score"`
}

type StringValue struct {
    StringValue string `json:"stringValue"`
}

type DoubleValue struct {
    DoubleValue float64 `json:"doubleValue"`
}

// GetUploadLabels returns the labels of the image as an array of strings
func (e FirestoreEvent) GetUploadLabels() []string {
    return funk.Map(e.Value.Fields.Labels.ArrayValue.Values, func(l LabelValues) string {
        return l.MapValue.Fields.Description.StringValue
    }).([]string)
}

The GetUploadLabels() function is an example of how the FirestoreUpload event object should be accessed. Here I am also using the go-funk package, which adds some extra functional capabilities to Go (but the performance isn’t as good as a "native" loop).

Summary

In this article I explained how to read Firestore events from Cloud Functions listening for them. The examples are written in Golang, but different languages will need to parse the messages in a similar way. Although not handy, this is the current format of Firestore events! Luckily, once you know how to read them, the rest is simple!

Credits: Header image by Luca Cavallin on unsplash.com

Introduction

This article presents a comparison of Cloud Pub/Sub and NATS as message brokers for the distributed applications. We are going to focus on the differences, advantages and disadvantages of both systems.

Cloud Pub/Sub

Cloud Pub/Sub provides messaging and ingestion features for event-driven systems and streaming analytics. The highlights of the tool can be summarized as follows:

  • Scalable, in-order message delivery with pull and push modes
  • Auto-scaling and auto-provisioning with support from zero to hundreds of GB/second
  • Independent quota and billing for publishers and subscribers
  • Global message routing to simplify multi-region systems

Furthermore, Cloud Pub/Sub provides the following benefits over non-Google-managed systems:

  • Synchronous, cross-zone message replication and per-message receipt tracking ensures reliable delivery at any scale
  • Auto-scaling and auto-provisioning with no partitions eliminates planning and ensures workloads are production ready from day one
  • Filtering, dead-letter delivery, and exponential backoff without sacrificing scale help simplify your applications
  • Native Dataflow integration enables reliable, expressive, exactly-once processing and integration of event streams in Java, Python, and SQL.
  • Optional per-key ordering simplifies stateful application logic without sacrificing horizontal scale—no partitions required.
  • Pub/Sub Lite aims to be the lowest cost option for high-volume event ingestion. – Pub/Sub Lite offers zonal storage and puts you in control of capacity management.

Some use cases of Cloud Pub/Sub include:

  • Google’s stream analytics makes data more organized, useful, and accessible from the instant it’s generated. Built on Pub/Sub along with Dataflow and BigQuery, their streaming solution provisions the resources needed to ingest, process, and analyze fluctuating volumes of real-time data for real-time business insights. This abstracted provisioning reduces complexity and makes stream analytics accessible to both data analysts and data engineers.
  • Pub/Sub works as a messaging middleware for traditional service integration or a simple communication medium for modern microservices. Push subscriptions deliver events to serverless webhooks on Cloud Functions, App Engine, Cloud Run, or custom environments on Google Kubernetes Engine or Compute Engine. Low-latency pull delivery is available when exposing webhooks is not an option or for efficient handling of higher throughput streams.

Features

Cloud Pub/Sub offers the following features:

  • At-least-once delivery: Synchronous, cross-zone message replication and per-message receipt tracking ensures at-least-once delivery at any scale.
  • Open: Open APIs and client libraries in seven languages support cross-cloud and hybrid deployments.
  • Exactly-once processing: Dataflow supports reliable, expressive, exactly-once processing of Pub/Sub streams.
  • No provisioning, auto-everything: Pub/Sub does not have shards or partitions. Just set your quota, publish, and consume.
  • Compliance and security: Pub/Sub is a HIPAA-compliant service, offering fine-grained access controls and end-to-end encryption.
  • Google Cloud–native integrations: Take advantage of integrations with multiple services, such as Cloud Storage and Gmail update events and Cloud Functions for serverless event-driven computing.
  • Third-party and OSS integrations: Pub/Sub provides third-party integrations with Splunk and Datadog for logs along with Striim and Informatica for data integration. Additionally, OSS integrations are available through Confluent Cloud for Apache Kafka and Knative Eventing for Kubernetes-based serverless workloads.
  • Seek and replay: Rewind your backlog to any point in time or a snapshot, giving the ability to reprocess the messages. Fast forward to discard outdated data.
  • Dead letter topics: Dead letter topics allow for messages unable to be processed by subscriber applications to be put aside for offline examination and debugging so that other messages can be processed without delay.
  • Filtering: Pub/Sub can filter messages based upon attributes in order to reduce delivery volumes to subscribers.

Pricing

Cloud Pub/Sub is free up to 10GB/month of traffic, and above this threshold, a flat-rate of $40.00/TB/month applies.

Summary of Cloud Pub/Sub

Cloud Pub/Sub is the default choice for cloud-native applications running on Google Cloud. Overall, the pros and cons of the tool can be summarized with the following points:

Main advantages

  • Google-managed. There is no complex setup or configuration needed to use it.
  • Integrations. Cloud Pub/Sub integrates seamlessly with other Google Cloud services, for example Kubernetes Engine.
  • Secure. End-to-end encryption enabled by default and built-in HIPAA compliance.

Main disadvantages

Refer to https://cloud.google.com/pubsub/docs/overview for further information.

NATS

NATS is a message broker that enables applications to securely communicate across any combination of cloud vendors, on-premise, edge, web and mobile, and devices. NATS consists of a family of open source products that are tightly integrated but can be deployed easily and independently. NATS facilitates building distributed applications and it provides Client APIs are in over 40 languages and frameworks including Go, Java, JavaScript/TypeScript, Python, Ruby, Rust, C#, C, and NGINX. Furthermore, real time data streaming, highly resilient data storage and flexible data retrieval are supported through JetStream , which is built into the NATS server.

The highlights of the tool can be summarized as follows:

  • With flexible deployments models using clusters, superclusters, and leaf nodes, optimize communications for your unique deployment. The NATS Adaptive Edge Architecture allows for a perfect fit for unique needs to connect devices, edge, cloud or hybrid deployments.
  • With true multi-tenancy, securely isolate and share your data to fully meet your business needs, mitigating risk and achieving faster time to value. Security is bifurcated from topology, so you can connect anywhere in a deployment and NATS will do the right thing.
  • With the ability to process millions of messages a second per server, you’ll find unparalleled efficiency with NATS. Save money by minimizing cloud costs with reduced compute and network usage for streams, services, and eventing.
  • NATS self-heals and can scale up, down, or handle topology changes anytime with zero downtime to your system. Clients require zero awareness of NATS topology allowing you future proof your system to meet your needs of today and tomorrow.

Some use cases of NATS include:

  • Cloud Messaging
    • Services (microservices, service mesh)
    • Event/Data Streaming (observability, analytics, ML/AI)
  • Command and Control
    • IoT and Edge
    • Telemetry / Sensor Data / Command and Control
  • Augmenting or Replacing Legacy Messaging Systems

Features

NATS offers the following features:

  • Language and Platform Coverage: Core NATS: 48 known client types, 11 supported by maintainers, 18 contributed by the community. NATS Streaming: 7 client types supported by maintainers, 4 contributed by the community. NATS servers can be compiled on architectures supported by Golang. NATS provides binary distributions.
  • Built-in Patterns: Streams and Services through built-in publish/subscribe, request/reply, and load-balanced queue subscriber patterns. Dynamic request permissioning and request subject obfuscation is supported.
  • Delivery Guarantees: At most once, at least once, and exactly once is available in JetStream.
  • Multi-tenancy and Sharing: NATS supports true multi-tenancy and decentralized security through accounts and defining shared streams and services.
  • AuthN: NATS supports TLS, NATS credentials, NKEYS (NATS ED25519 keys), username and password, or simple token.
  • AuthZ: Account limits including number of connections, message size, number of imports and exports. User-level publish and subscribe permissions, connection restrictions, CIDR address restrictions, and time of day restrictions.
  • Message Retention and Persistence: Supports memory, file, and database persistence. Messages can be replayed by time, count, or sequence number, and durable subscriptions are supported. With NATS streaming, scripts can archive old log segments to cold storage.
  • High Availability and Fault Tolerance: Core NATS supports full mesh clustering with self-healing features to provide high availability to clients. NATS streaming has warm failover backup servers with two modes (FT and full clustering). JetStream supports horizontal scalability with built-in mirroring.
  • Deployment: The NATS network element (server) is a small static binary that can be deployed anywhere from large instances in the cloud to resource constrained devices like a Raspberry PI. NATS supports the Adaptive Edge architecture which allows for large, flexible deployments. Single servers, leaf nodes, clusters, and superclusters (cluster of clusters) can be combined in any fashion for an extremely flexible deployment amenable to cloud, on-premise, edge and IoT. Clients are unaware of topology and can connect to any NATS server in a deployment.
  • Monitoring: NATS supports exporting monitoring data to Prometheus and has Grafana dashboards to monitor and configure alerts. There are also development monitoring tools such as nats-top. Robust side car deployment or a simple connect-and-view model with NATS surveyor is supported.
  • Management: NATS separates operations from security. User and Account management in a deployment may be decentralized and managed through a CLI. Server (network element) configuration is separated from security with a command line and configuration file which can be reloaded with changes at runtime.
  • Integrations: NATS supports WebSockets, a Kafka bridge, an IBM MQ Bridge, a Redis Connector, Apache Spark, Apache Flink, CoreOS, Elastic, Elasticsearch, Prometheus, Telegraf, Logrus, Fluent Bit, Fluentd, OpenFAAS, HTTP, and MQTT, and more.

Pricing

There are no fees involved with deploying NATS, however, the costs of the instances running the system and related maintenance (and related time-cost) must be taken into account. The final cost depends on the number and type of instances chosen to run NATS.

Summary of NATS

NATS is a CNCF-regnonized message broker.
Overall, the pros and cons of the tool can be summarized with the following points:

Main advantages

  • It supports more patterns. Streams and Services through built-in publish/subscribe, request/reply, and load-balanced queue subscriber patterns. Dynamic request permissioning and request subject obfuscation is supported.

Main disadvantages

  • User-managed. While NATS can be deployed as a Google Cloud Marketplace solution, more complex scenarios like multi-regional clusters require an extensive amount of user-supplied configuration, both for NATS itself and related resources (for example, firewall rules). Using the Helm-charts provided by NATS to run it on Kubernetes however, facilitates many aspects of the process (see https://docs.nats.io/nats-on-kubernetes/nats-kubernetes)

Refer to https://docs.nats.io/ for further information.

Conclusion

Cloud Pub/Sub and NATS are both excellent, battle-tested message brokers. Whether you pick one or the other, it’s often up to your requirements and preferences. Personally, I would always recommend Cloud Pub/Sub where the requirements allow for it, because of a high degree of integration with other Google Cloud products and because, being managed by Google, Cloud Pub/Sub frees engineers from the complex and time consuming process of setting up and maintaining a third-party solution.

Credits: Header image by Luca Cavallin on unsplash.com

I’ve been using Terraform for over two years now, and from the start, I’ve found the need for proper input validation. Although Terraform has added variable types with HCL 2.0 and even input variable validation rules in Terraform CLI v0.13.0, it still appears challenging.

Why input validation?

There are several reasons why input validation is a good idea, but the most important one is saving time!
It has often happened that a plan output looked good to me, and Terraform itself found no errors, but when I ran apply, the code would still break. Not all providers seem to tell you in advance that a specific combination of settings is not correct. It can also happen that the resource’s name does not validate, but this won’t detect during the plan stage. Especially when deploying GKE clusters or services like Composer on Google Cloud, you could be waiting for more than 30 minutes before these kinds of errors pop up. And then you can wait another 30 minutes for the resources to be deleted again.
You see how being notified about these setting errors in advance can help you save a lot of time.

Using custom assertions

One of the first things we implemented in our custom modules is the use of assertions. Making sure Terraform would show a proper error message came with a challenge, as there is no predefined function for this. On top of that, adding custom checks to break out of the code is not supported by HCL.

example:

# variables.tf
variable "prefix" {
  description = "Company naming prefix, ensures uniqueness of bucket names"
  type        = string
  default     = "binx"
}
variable "project" {
  description = "Company project name."
  type        = string
  default     = "blog"
}
variable "environment" {
  description = "Company environment for which the resources are created (e.g. dev, tst, acc, prd, all)."
  type        = string
  default     = "dev"
}
variable "purpose" {
  description = "Bucket purpose, will be used as part of the bucket name"
  type        = string
}

# main.tf
locals {
  bucket_name = format("%s-%s-%s-%s", var.prefix, var.project, var.environment, var.purpose)
}

resource "google_storage_bucket" "demo" {
  provider = google-beta
  name     = local.bucket_name
}

When you run terraform plan on this example code, and enter a purpose with invalid chars, you will get the following output:

var.purpose
  Bucket purpose, will be used as part of the bucket name
  Enter a value: test^&asd

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_storage_bucket.demo will be created
  + resource "google_storage_bucket" "demo" {
      + bucket_policy_only          = (known after apply)
      + force_destroy               = false
      + id                          = (known after apply)
      + location                    = "US"
      + name                        = "binx-blog-dev-test^&asd"
      + project                     = (known after apply)
      + self_link                   = (known after apply)
      + storage_class               = "STANDARD"
      + uniform_bucket_level_access = (known after apply)
      + url                         = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

As you can see, Terraform does not check the characters in the bucket name and will let you think everything is fine. However, if you try to apply this code, you will be given an error during the bucket creation telling you the name is invalid:

google_storage_bucket.demo: Creating...
╷
│ Error: googleapi: Error 400: Invalid bucket name: 'binx-blog-dev-test^&asd', invalid
│
│   with google_storage_bucket.demo,
│   on test.tf line 30, in resource "google_storage_bucket" "demo":
│   30: resource "google_storage_bucket" "demo" {
│
╵

We really want to detect this during the plan phase to tell the developer to fix his code before merging it into master for rollout.
So how did we solve this problem?

It appears that we can abuse the file function to show a proper error message when our checks fail. The nice thing about this function is that it will show you an error message if the file you’re trying to load does not exist. We can use this to our advantage by providing an error description as the file name.

example:

# main.tf
locals {
  regex_bucket_name = "(([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])\\.)*([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])" # See https://cloud.google.com/storage/docs/naming-buckets

  bucket_name       = format("%s-%s-%s-%s", var.prefix, var.project, var.environment, var.purpose)
  bucket_name_check = length(regexall("^${local.regex_bucket_name}$", local.bucket_name)) == 0 ? file(format("Bucket [%s]'s generated name [%s] does not match regex ^%s$", var.purpose, local.bucket_name, local.regex_bucket_name)) : "ok"
}

resource "google_storage_bucket" "demo" {
  provider = google-beta
  name     = local.bucket_name
}

Using the example above, when we run Terraform plan with the same purpose input as before, we’ll get the following message:

│ Error: Invalid function argument
│
│   on main.tf line 5, in locals:
│    5:   bucket_name_check = length(regexall("^${local.regex_bucket_name}$", local.bucket_name)) == 0 ? file(format("Bucket [%s]'s generated name [%s] does not match regex ^%s$", var.purpose, local.bucket_name, local.regex_bucket_name)) : "ok"
│     ├────────────────
│     │ local.bucket_name is "binx-blog-dev-test^&asd"
│     │ local.regex_bucket_name is "(([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])\\.)*([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])"
│     │ var.purpose is "test^&asd"
│
│ Invalid value for "path" parameter: no file exists at Bucket [test^&asd]'s generated name [binx-blog-dev-test^&asd] does not match regex ^(([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])\.)*([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])$; this function works only with files that are distributed as part of the configuration source code, so if this file will be created by a resource in this configuration you must instead obtain this result
│ from an attribute of that resource.

As you can see, Terraform will try to load a file named:

Bucket [test^&asdf]'s generated name [lot-blog-dev-test^&asdf] does not match regex:
^(([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])\.)*([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])$

This filename is actually the error message we want to show. Obviously, this file does not exist, so it will fail and display this message on the console.

Nice! So we have a way to tell the developer in advance that the provided input is not valid. But how can we make the message even more apparent? We can use newlines!

Check out the following example:

# main.tf
locals {
  regex_bucket_name = "(([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])\\.)*([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])" # See https://cloud.google.com/storage/docs/naming-buckets

  assert_head = "\n\n-------------------------- /!\\ CUSTOM ASSERTION FAILED /!\\ --------------------------\n\n"
  assert_foot = "\n\n-------------------------- /!\\ ^^^^^^^^^^^^^^^^^^^^^^^ /!\\ --------------------------\n\n"

  bucket_name       = format("%s-%s-%s-%s", var.prefix, var.project, var.environment, var.purpose)
  bucket_name_check = length(regexall("^${local.regex_bucket_name}$", local.bucket_name)) == 0 ? file(format("%sBucket [%s]'s generated name [%s] does not match regex:\n^%s$%s", local.assert_head, var.purpose, local.bucket_name, local.regex_bucket_name, local.assert_foot)) : "ok"
}

resource "google_storage_bucket" "demo" {
  provider = google-beta
  name     = local.bucket_name
}

We will now get the following error output:

│ Error: Invalid function argument
│
│   on main.tf line 8, in locals:
│    8:   bucket_name_check = length(regexall("^${local.regex_bucket_name}$", local.bucket_name)) == 0 ? file(format("%sBucket [%s]'s generated name [%s] does not match regex:\n^%s$%s", local.assert_head, var.purpose, local.bucket_name, local.regex_bucket_name, local.assert_foot)) : "ok"
│     ├────────────────
│     │ local.assert_foot is "\n\n-------------------------- /!\\ ^^^^^^^^^^^^^^^^^^^^^^^ /!\\ --------------------------\n\n"
│     │ local.assert_head is "\n\n-------------------------- /!\\ CUSTOM ASSERTION FAILED /!\\ --------------------------\n\n"
│     │ local.bucket_name is "binx-blog-dev-test^&asd"
│     │ local.regex_bucket_name is "(([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])\\.)*([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])"
│     │ var.purpose is "test^&asd"
│
│ Invalid value for "path" parameter: no file exists at
│
│ -------------------------- /!\ CUSTOM ASSERTION FAILED /!\ --------------------------
│
│ Bucket [test^&asd]'s generated name [binx-blog-dev-test^&asd] does not match regex:
│ ^(([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])\.)*([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9])$
│
│ -------------------------- /!\ ^^^^^^^^^^^^^^^^^^^^^^^ /!\ --------------------------
│
│ ; this function works only with files that are distributed as part of the configuration source code, so if this file will be created by a resource in this configuration you must instead obtain this result from an attribute of that resource.

Conlusion

As you can see, using the file function to generate assertions opens up many possibilities when it comes to input validation. You can see even more examples on how to use this in our own built Terraform modules found on the Terraform Registry. Part of each of these modules is an asserts.tf file, which defines assertions using the file method as described above.

I still think it would be nice if Terraform would create an actual function for us that we can use to generate custom errors. The workaround I showed in this blog post can easily confuse you, as Terraform tells you no file exists while you’re not trying to actually read any files.

You can search across log groups by setting up a correlation id on your log lines. You can then use CloudWatch Logs Insights to query all related log messages.

If you are dealing with different lambda functions in a project, you might not have a logging system yet. Finding and troubleshooting using Cloudwatch Logs could become cumbersome. Especially if you try to track a flow through the different log groups.

In CloudWatch Logs Insights you can execute queries across log groups. In our case we want to collect related log lines across log groups. To achieve this we need to annotate the log lines with a correlation_id that ties them together.

Sample application

We will use the following sample application to show how to setup a correlation id on your log lines. The application looks as followed:

  1. API Gateway that is invoking an AWS Lambda function
  2. A api Lambda function places the body in a SQS Queue
  3. A callback Lambda function is receiving messages from the SQS Queue
Architecture of the Sample Application

Every time that an API call executes, the apifunction is invoked. We can annotate the log lines by using the Lambda Powertools for Python.

from aws_lambda_powertools import Logger
from aws_lambda_powertools.logging import correlation_paths
from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.utilities.data_classes import (
    event_source,
    APIGatewayProxyEvent,
)
logger = Logger(service="api")

@event_source(data_class=APIGatewayProxyEvent)
@logger.inject_lambda_context(correlation_id_path=correlation_paths.API_GATEWAY_REST)
def lambda_handler(event: APIGatewayProxyEvent, context: LambdaContext) -> dict:
    logger.info("This message now has a correlation_id set")

We take the requestContext.requestId from the incoming event. We can use that as a correlation_id. (This is also returned to the caller as a x-amzn-requestid header by default by API Gateway.)

To be able to search across log groups using the same correlation id. We need to pass it along with the message as an attribute when we send it to SQS.

    response = client.send_message(
        QueueUrl=os.environ.get("QUEUE_URL"),
        MessageBody=json.dumps(event.body),
        MessageAttributes={
            "correlation_id": {
                "DataType": "String",
                "StringValue": event.request_context.request_id,
            }
        },
    )

I used a correlation_id attribute to pass the correlation id with the SQS message.

The queue will trigger the callback function with a batch of messages. You need to loop through the messages in the batch. Process them and remove them from the queue.
Or you use the sqs_batch_processor from the powertools.

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.utilities.batch import sqs_batch_processor
from aws_lambda_powertools.utilities.data_classes.sqs_event import SQSRecord
logger = Logger(service="callback")

def record_handler(record: dict) -> None:
    record = SQSRecord(record)
    # Get the correlation id from the message attributes
    correlation_id = record.message_attributes["correlation_id"].string_value
    logger.set_correlation_id(correlation_id)
    logger.info(f"Processing message with {correlation_id} as correlation_id")

@sqs_batch_processor(record_handler=record_handler)
def lambda_handler(event: dict, context: LambdaContext) -> None:
    logger.set_correlation_id(None)
    logger.info(f"Received a SQSEvent with {len(list(event.records))} records")

The sqs_batch_processor decorator will call the record_handler method for each record received. No need for you to handle the deletion of messages.

In the end we do set the correlation id to None. We do this because the rest of the log lines are not related to the message received.

Show me!

If you call the API using a simple curl command, you can see the correlation_id (Note: this is also in the headers)

initial request using curl

So imagine that something went wrong. We know the correlation_id so we will use CloudWatch Logs Insights:

fields level, service, message
| filter correlation_id="263021ca-fb82-41ff-96ad-2e933aadbe0a"
| sort @timestamp desc
Sample query of AWS CloudWatch Logs Insights

As you can see there are 3 results 2 lines originated from the api service and 1 from the callback service. If have an interest in how to create queries you can read the CloudWatch Logs Insights query syntax.

Conclusion

The Lambda Powertools for Python make it easy
to add a correlation_id. By using the structured logs from the powertools you can use each property in the log line to
query on. It has a lot more utilities, so I encourage go check it out!

Not a python developer? There is a Java
version and there are some initiatives on the roadmap
to add other languages as well.

I have recently worked on a project where I needed to configure a Helm release with secrets hard-coded in Terraform. With Cloud KMS, I could encrypt the secrets so that they could safely be committed to git. In this article, I am going to show you how the process works.

Setup Cloud KMS

Since my project is already up and running, all I had to do was to create a Cloud KMS keyring and crypto key that will be used for encrypting and decrypting secrets. This can be done via Terraform with a few resources:

data "google_project" "secrets" {
  project_id = "secrets"
}

resource "google_kms_key_ring" "this" {
  project  = data.google_project.secrets.project_id
  name     = "default"
  location = "europe-west4"
}

resource "google_kms_crypto_key" "this" {
  name     = "default"
  key_ring = google_kms_key_ring.this.self_link
}

resource "google_project_iam_member" "this" {
  project = data.google_project.secrets.project_id
  role    = "roles/cloudkms.cryptoKeyEncrypter"
  member  = "group:encrypter@example.com"
}

In this example, I am also assigning the cloudkms.cryptoKeyEncrypter role to an imaginary encrypter@example.com group so that members can encrypt new secrets.

Encrypt the secrets

I then used this command to encrypt a secret for use in Terraform via Cloud KMS:

echo -n <your-secret> | gcloud kms encrypt --project <project-name> --location <region> --keyring default --key default --plaintext-file - --ciphertext-file - | base64

To encrypt a whole file instead, you can run:

cat <path-to-your-secret> | gcloud kms encrypt --project <project-name> --location <region> --keyring default --key default --plaintext-file - --ciphertext-file - | base64
  • <your-secret> is the string or file you want to encrypt
  • <project-name> is the project whose KMS keys should be used to encrypt the secret.
  • <region> is the region in which the KMS keyring is configured.

If you are working on macOS, you can append | pbcopy to the command so the resulting output will be added automatically to the clipboard.

Finally, an encrypted string looks like this:

YmlueGlzYXdlc29tZWJpbnhpc2F3ZXNvbWViaW54aXNhd2Vzb21lYmlueGlzYXdlc29tZWJpbnhpc2F3ZXNvbWViaW54aXNhd2Vzb21lYmlueGlzYXdlc29tZQ

Configure the Cloud KMS data source

The encrypted string is not stored anywhere. What I did instead, is to use the Cloud KMS data source to decrypt it on-the-fly.

data "google_kms_secret" "secret_key" {
  crypto_key = data.google_kms_crypto_key.this.self_link
  ciphertext = "YmlueGlzYXdlc29tZWJpbnhpc2F3ZXNvbWViaW54aXNhd2Vzb21lYmlueGlzYXdlc29tZWJpbnhpc2F3ZXNvbWViaW54aXNhd2Vzb21lYmlueGlzYXdlc29tZQ"
}

The decrypted value can be referenced with:

data.google_kms_secret.secret_key.plaintext

Terraform will decrypt the string automagically and replace it with the actual value where it is referenced.

Summary

In this short post I have explained how I used Cloud KMS to encrypt strings that can be safely committed to git and used in Terraform. If you have multiple environments that you need to support, remember that KMS keys are different for each project, therefore you will need to encrypt the same string again, once for each project. If you need some input on how to organize a multi-project Terraform repository, have a look at my recent article How to use Terraform workspaces to manage environment-based configuration
.

Credits: Header image by Luca Cavallin on nylavak.com

Anthos Service Mesh is a suite of tools that helps you monitor and manage a reliable service mesh on-premises or on Google Cloud. I recently tested it as an alternative to an unmanaged Istio installation and I was surprised at how much easier Anthos makes it to deploy a service mesh on Kubernetes clusters.
In this article, I am going to explain step-by-step how I deployed a multi-cluster, multi-region service mesh using Anthos Service Mesh. During my proof of concept I read the documentation at https://cloud.google.com/service-mesh/docs/install, but none of the guides covered exactly my requirements, which are:

  • Multi-cluster, multi-region service mesh
  • Google-managed Istio control plane (for added resiliency, and to minimize my effort)
  • Google-managed CA certificates for Istio mTLS

Deploy the GKE clusters

Deploy the two GKE clusters. I called them asm-a and asm-b (easier to remember) and deployed them in two different regions (us-west2-a and us-central1-a). Because Anthos Service Mesh requires nodes to have at least 4 vCPUs (and a few more requirements, see the complete list at): https://cloud.google.com/service-mesh/docs/scripted-install/asm-onboarding), use at least the e2-standard-4 machines.

As preparation work, store the Google Cloud Project ID in an environment variable so that the remaining commands can be copied and pasted directly.

export PROJECT_ID=$(gcloud info --format='value(config.project)')

Then, to deploy the clusters, run:

gcloud container clusters create asm-a --zone us-west2-a --machine-type "e2-standard-4" --disk-size "100" --num-nodes "2" --workload-pool=${PROJECT_ID}.svc.id.goog --async

gcloud container clusters create asm-b --zone us-central1-a --machine-type "e2-standard-4" --disk-size "100" --num-nodes "2" --workload-pool=${PROJECT_ID}.svc.id.goog --async

The commands are also enabling Workload Identity, which you can read more about at: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity.

Fetch the credentials to the clusters

Once the clusters have been created, fetch the credentials needed to connect to the them via kubectl. Use the following commands:

gcloud container clusters get-credentials asm-a --zone us-west2-a --project ${PROJECT_ID}
gcloud container clusters get-credentials asm-b --zone us-central1-a --project ${PROJECT_ID}

Easily switch kubectl context with kubectx

kubectx makes it easy to switch between clusters and namespaces in kubectl (also known as context) by creating a memorable alias for them (in this case, asma and asmb). Learn more about the tool at: https://github.com/ahmetb/kubectx.

kubectx asma=gke_${PROJECT_ID}_us-west2-a_asm-a
kubectx asmb=gke_${PROJECT_ID}_us-central1-a_asm-b

Set the Mesh ID label for the clusters

Set the mesh_id label on the clusters before installing Anthos Service Mesh, which is needed by Anthos to identify which clusters belong to which mesh. The mesh_id is always in the format proj-<your-project-number>, and the project number for the project can be found by running:

gcloud projects list

Use these commands to create the mesh_id label on both clusters (replace <your-project-number> with the project number found with the previous command:

export MESH_ID="proj-<your-project-number>"
gcloud container clusters update asm-a --region us-west2-a --project=${PROJECT_ID} --update-labels=mesh_id=${MESH_ID}

gcloud container clusters update asm-b --region us-central1-a --project=${PROJECT_ID} --update-labels=mesh_id=${MESH_ID}

Enable StackDriver

Enable StackDriver on the clusters to be able to see logs, should anything go wrong during the setup!

gcloud container clusters update asm-a --region us-west2-a --project=${PROJECT_ID} --enable-stackdriver-kubernetes

gcloud container clusters update asm-b --region us-central1-a --project=${PROJECT_ID} --enable-stackdriver-kubernetes

Create firewall rules for cross-region communication

The clusters live in different regions, therefore a new firewall rule must be created to allow communication between them and their pods. Bash frenzy incoming!

ASMA_POD_CIDR=$(gcloud container clusters describe asm-a --zone us-west2-a --format=json | jq -r '.clusterIpv4Cidr')
ASMB_POD_CIDR=$(gcloud container clusters describe asm-b --zone us-central1-a --format=json | jq -r '.clusterIpv4Cidr')
ASMA_PRIMARY_CIDR=$(gcloud compute networks subnets describe default --region=us-west2 --format=json | jq -r '.ipCidrRange')
ASMB_PRIMARY_CIDR=$(gcloud compute networks subnets describe default --region=us-central1 --format=json | jq -r '.ipCidrRange')
ALL_CLUSTER_CIDRS=$ASMA_POD_CIDR,$ASMB_POD_CIDR,$ASMA_PRIMARY_CIDR,$ASMB_PRIMARY_CIDR

gcloud compute firewall-rules create asm-multicluster-rule \
    --allow=tcp,udp,icmp,esp,ah,sctp \
    --direction=INGRESS \
    --priority=900 \
    --source-ranges="${ALL_CLUSTER_CIDRS}" \
    --target-tags="${ALL_CLUSTER_NETTAGS}" --quiet

Install Anthos Service Mesh

First, install the required local tools as explained here: https://cloud.google.com/service-mesh/docs/scripted-install/asm-onboarding#installing_required_tools.

The install_asm tool will install Anthos Service Mesh on the clusters. Pass these options to fulfill the initial requirements:

  • --managed: Google-managed Istio control plane
  • --ca mesh_ca: Google-managed CA certificates for Istio mTLS
  • --enable_registration: automatically registers the clusters with Anthos (it can also be done manually later)
  • --enable_all: all Google APIs required by the installation will be enabled automatically by the script
./install_asm --project_id ${PROJECT_ID} --cluster_name asm-a --cluster_location us-west2-a --mode install --managed --ca mesh_ca --output_dir asma --enable_registration --enable_all

./install_asm --project_id ${PROJECT_ID} --cluster_name asm-b --cluster_location us-central1-a --mode install --managed --ca mesh_ca --output_dir asmb --enable_registration --enable_all

Configure endpoint discovery between clusters

Endpoint discovery makes the clusters to communicate with each other, for example, it enables discovery of service endpoints between the clusters.

Install the required local tools as explained here: https://cloud.google.com/service-mesh/docs/downloading-istioctl, then run the following commands:

istioctl x create-remote-secret --context=asma --name=asm-a| kubectl apply -f - --context=asmb

istioctl x create-remote-secret --context=asmb --name=asm-b| kubectl apply -f - --context=asma

Testing the service mesh

Anthos Service Mesh is now ready! Let’s deploy a sample application to verify cross-cluster traffic and fail-overs.

Create the namespace for the Hello World app

Create a new namespace on both clusters and enable automatic Istio sidecar injection for both of them. Since the Istio control plane is managed by Google, the istio-injection- istio.io/rev= label is set to asm-managed.

kubectl create --context=asma namespace sample

kubectl label --context=asma namespace sample istio-injection- istio.io/rev=asm-managed --overwrite

kubectl create --context=asmb namespace sample

kubectl label --context=asmb namespace sample istio-injection- istio.io/rev=asm-managed --overwrite

Create the Hello World service

Deploy the services for the Hello World app on both clusters with:

kubectl create --context=asma -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/helloworld/helloworld.yaml -l service=helloworld -n sample

kubectl create --context=asmb -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/helloworld/helloworld.yaml -l service=helloworld -n sample

Create the Hello World deployment

Deploy the Hello World sample app, which provides an endpoint that will return the version number of the application (the version number is different in the two clusters) and an Hello World message to go with it.

kubectl create --context=asma -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/helloworld/helloworld.yaml -l version=v1 -n sample

kubectl create --context=asmb -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/helloworld/helloworld.yaml -l version=v2 -n sample

Deploy the Sleep pod

The Sleep application simulates downtime. Let’s use it to test the resilience of the service mesh! To deploy the Sleep application, use:

kubectl apply --context=asma -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/sleep/sleep.yaml -n sample

kubectl apply --context=asmb -f https://raw.githubusercontent.com/istio/istio/1.9.5/samples/sleep/sleep.yaml -n sample

Verify cross-cluster traffic

To verify that cross-cluster load balancing works as expected (read as: can the service mesh actually survive regional failures?), call the HelloWorld service several times using the Sleep pod. To ensure load balancing is working properly, call the HelloWorld service from all clusters in your deployment.

kubectl exec --context=asma -n sample -c sleep "$(kubectl get pod --context=asma -n sample -l app=sleep -o jsonpath='{.items[0].metadata.name}')" -- curl -sS helloworld.sample:5000/hello

kubectl exec --context=asmb -n sample -c sleep "$(kubectl get pod --context=asmb -n sample -l app=sleep -o jsonpath='{.items[0].metadata.name}')" -- curl -sS helloworld.sample:5000/hello

Repeat this request several times and verify that the HelloWorld version should toggle between v1 and v2. This means the request is relayed to the healthy cluster when the other one is not responding!

Summary

In this article I have explained how I deployed Anthos Service Mesh on two GKE clusters in different regions with Google-managed Istio control plane and CA certificates. Anthos Service Mesh makes it simple to deploy a multi-cluster service mesh, because most of the complexity of Istio is now managed by Google.

Credits: Header image by Luca Cavallin on nylavak.com

Xebia, global IT consultancy firm, announces the acquisition of Oblivion to expand its capabilities around Amazon Web Services (AWS).

As one of the first consultancy firms in the Benelux to fully embrace Amazon Web Services, Oblivion has delivered hundreds of successful projects for leading organizations in various industries, such as Aegon, ABN AMRO, Wehkamp, PostNL, and DSM. In 2018, Amazon Web Services recognized Oblivion as a Premier Consulting Partner, and awarded them with the AWS Migration, DevOps, Financial Services, and IoT competencies.

Xebia helps companies across the globe digitally transform by offering high-quality cloud, data, AI, Agile, DevOps, and software consultancy. The rapidly growing group has launched several successful brands, including cloud expert Binx.io and data and AI specialist GoDataDriven. By acquiring Oblivion, Xebia emphasizes its focus on cloud services and continues to expand in line with its strategy.

"When it comes to cloud technology, Xebia’s aspirations are immense. As an Advanced AWS Consulting and Training Partner, we have developed a solid relationship with AWS. Besides more focus, Oblivion brings the breadth and depth we need to complete even the most challenging cloud projects successfully for our clients in the Benelux and far beyond,” said Andrew de la Haije, CEO Xebia.

"We are very excited about the opportunities ahead. By working together with Xebia, we are one step closer to our goal of becoming one of Europe’s best cloud services partners. Besides operational efficiency gains, this collaboration also enables us to offer additional services while preserving our AWS-only strategy and the down-to-earth approach our clients know and appreciate,” said Edwin van Nuil, CEO of Oblivion.

"Oblivion provides our clients with a clear point of contact for everything AWS-related. I think they are the most experienced AWS partner in the Benelux, really living and breathing every aspect of Amazon Web Services. In the short time that we have been working together, our customers have already experienced the added value of their expertise," adds Bart Verlaat, CEO of Binx.io.

"Just like Xebia, we believe in focus. With so much expertise within the group, we can now offer our customers a "one-stop-shop" experience. From cloud engineering and managed services, to machine learning, multi-cloud solutions, data and AI, cloud-native software development and security," concludes Eric Carbijn, co-founder of Oblivion.

● After the merger, Oblivion and Xebia are Premier AWS Partner and Authorized Training Partner, including a Machine Learning Competency, DevOps Competency, Migration Competency, Financial Services Competency, and Solution Provider status.

● Oblivion will continue to operate under its own label and under the existing management of Edwin van Nuil and Eric Carbijn.

● The acquisition will be signified by the addition of "proudly part of Xebia" to the company logo.

To calculate the elapsed times of Terraform Cloud run stages, I needed to create a pivot table in Google BigQuery. As the classic SQL query was very expensive, I changed it to use the BigQuery PIVOT operator which was 750 times cheaper.

(more…)

After I added the Google Cloud Storage backend to PrivateBin, I was
interested in its performance characteristics. To fake a real PrivateBin client, I needed to generate an encrypted message.
I searched for a client library and found PBinCLI. As it is written in Python,
I could write a simple load test in Locust. Nice and easy. How wrong was I.

create the test

Within a few minutes I had a test script to create, retrieve and delete a paste.
To validate the functional correctness, I ran it against https://privatebin.net:

$ locust -f locustfile.py \
   --users 1 \
   --spawn-rate 1 \
   --run-time 1m \
   --headless \
   --host https://privatebin.net

The following table shows measured response times:

NameAvg (ms)Min (ms)Max (ms)Median (ms)
create-paste237140884200
get-paste28273029
delete-paste27272927

Now, I had a working load test script and a baseline to compare the performance with.

baseline on Google Cloud Storage

After deploying the service to Google Cloud Run, I ran the single user test. And it was promising.

$ locust -f locustfile.py \
  --users 1 \
  --spawn-rate 1 \
  --run-time 1m \
  --headless \
  --host https://privatebin-deadbeef-ez.a.run.app

Sure, it is not lightning fast, but I did not expect that. The response times looked
acceptable to me. After all, privatebin is not used often nor heavily.

NameAvg (ms)Min (ms)Max (ms)Median (ms)
create-paste506410664500
get-paste394335514380
delete-paste587443974550

multi-user run on Google Cloud Storage

next I wanted to know the performance of PrivateBin with my Google Cloud Storage onder moderate load. So, I scaled the load test to 5 concurrent users.

$ locust -f locustfile.py \
  --users 5 \
  --spawn-rate 1 \
  --run-time 1m \
  --headless

The results were shocking!

NameAvg (ms)Min (ms)Max (ms)Median (ms)
create-paste4130662106664500
delete-paste344976862833800
get-paste290952055692400

How? Why? Was there a bottleneck at the storage level? I checked the logs and saw steady response times reported by Cloud Run:

POST 200 1.46 KB 142 ms python-requests/2.25.1  https://privatebin-37ckwey3cq-ez.a.run.app/
POST 200 1.25 KB 382 ms python-requests/2.25.1  https://privatebin-37ckwey3cq-ez.a.run.app/
GET  200 1.46 KB 348 ms python-requests/2.25.1  https://privatebin-37ckwey3cq-ez.a.run.app/?d7e4c494ce4f613f

It took me a while to discover that locust was trashing my little M1. It was running at 100% CPU without
even blowing a fan to create the encrypted messages! So I needed something more efficient.

k6 to the rescue!

When my brain thinks fast, it thinks golang. So I downloaded k6. The user scripts
are written in JavaScript, but the engine is pure go. Unfortunately, the interpreter
is custom built and has limited compatibility with nodejs and browser JavaScript engines.
This meant that I could not use any existing JavaScript libraries to create an encrypted message.

Fortunately with xk6 you can call a golang
function from your JavaScript code. This is what I needed! So I created the k6 privatebin extension and wrote
an equivalent load test script.

build a customized k6

To use this new extension, I build k6 locally with the following commands:

go install github.com/k6io/xk6/cmd/xk6@latest
xk6 build --with github.com/binxio/xk6-privatebin@v0.1.2

k6 baseline run on Google Cloud Storage

Now I was ready to run the baseline for the single user, using k6:

$ ./k6 --log-output stderr run -u 1 -i 100  test.js

The baseline result looked pretty much like the Locust run:

NameAvg (ms)Min (ms)Max (ms)Median (ms)
create-paste440396590429
get-paste355320407353
delete-paste382322599357

k6 multi-user run on Google Cloud Storage

What would happen, I i scaled up to 5 concurrent users?

$ ./k6 --log-output stderr run -u 5 -i 100  test.js

Woohoo! The response times stayed pretty flat.

NameAvg (ms)Min (ms)Max (ms)Median (ms)
create-paste5843502612555
get-paste484316808490
delete-paste460295843436

The chart below shows the sum of the medians for the single- and multi-user load test on Locust and k6:

It is clear that Locust was skewing the results way too much. With k6, I could even simulate 20 concurrent users, and still only use 25% of my M1.

NameAvg (ms)Min (ms)Max (ms)Median (ms)
create-paste7134141204671
get-paste562352894540
delete-paste515351818495

These are way more users than i expect on my privatebin installation and these response times are very acceptable for me. Mission accomplished!

conclusion

In the future, my goto tool for load and performance tests will be k6. It is a single binary which you can run anywhere. It is fast, and the golang extensions makes it easy to include any compute intensive tasks in a very user-friendly manner.

I have recently worked on a 100%-Terraform based project where I made extensive use of Workspaces and modules to easily manage the infrastructure for different environments on Google Cloud. This blog post explains the structure I have found to work best for the purpose.

What are Terraform workspaces?

Workspaces are separate instances of state data that can be used from the same working directory. You can use workspaces to manage multiple non-overlapping groups of resources with the same configuration.
To create and switch to a new workspace, after running terraform init, run:

terraform workspace create <name>

To switch to other workspaces, run instead:

terraform workspace select <name>

Use the selected workspace in the Terraform files

The selected workspace is made available in your .tf files via the terraform.workspace variable (it’s a string). I like to assign the value to a local variable called environment (since the name of the workspaces and that of the environments match).

locals {
    environment = terraform.workspace
}

Add the environment-based configuration to a new module

Now that you have a variable containing the environment (it’s just a string) that you are operating on, you can create a module containing the environment-based configuration. I created a vars module inside the modules directory of my repository, which contains at least the following files:

  • main.tf
    This file will never changes, as it’s only needed to aggregate the variables that will be exported.

    locals {
      environments = {
        "development" : local.development,
        "acceptance" : local.acceptance,
        "production" : local.production
      }
    }
  • outputs.tf
    This file too, will never change. Here I am defining the output of the vars module so that it can be used from anywhere else in the Terraform repository.
    The exported values are based on the selected workspace.

    output "env" {
    value = local.environments[var.environment]
    }
  • variables.tf
    This file defines the variables required to initialize the module. The outputs of the module are based on the selected workspace (environment), which it needs to be aware of.

    variable "environment" {
    description = "The environment which to fetch the configuration for."
    type = string
    }
  • development.tf & acceptance.tf & production.tf
    These files contain the actual values that differ by environment. For example, when setting up a GKE cluster, you might want to use cheap machines for your development node pool, and more performant ones in production. This can be done by defining a node_pool_machine_type value in each environment, like so:

    // in development.tf
    locals {
    development = {
        node_pool_machine_type = "n2-standard-2"
    }
    }
    // in acceptance.tf
    locals {
    acceptance = {
        node_pool_machine_type = "n2-standard-4"
    }
    }
    // in production.tf
    locals {
    production = {
        node_pool_machine_type = "n2-standard-8"
    }
    }

The vars module is now ready to be used from anywhere in the repository, for example in main.tf file. To access the configuration values, initialize the module like so:

#
# Fetch variables based on Environment
#
module "vars" {
    source      = "./modules/vars"
    environment = local.environment
}

The correct configuration will be returned based on the Terraform Workspace (
environment name) being passed to it, and values can be accessed via module.vars.env.<variable-name>. For example:

node_pools = [
    {
        ...
        machine_type = module.vars.env.node_pool_machine_type
        ...
    }
]

Summary

In this blog post I have shown you how you can use Terraform Workspaces to switch between different configurations based on the environment you are working on, while keeping the setup as clean and simple as possible. Are you interested in more articles about Terraform? Checkout How to Deploy ElasticSearch on GKE using Terraform and Helm!

Credits: Header image by Luca Cavallin on nylavak.com

We have read and selected a few articles and other resources about the latest developments around cloud technology so you don’t have to. Read further and keep yourself up-to-date in five minutes!

In a data-driven, global, always-on world, databases are the engines that let businesses innovate and transform. As databases get more sophisticated and more organizations look for managed database services to handle infrastructure needs, there are a few key trends we’re seeing.

Read more:
https://cloud.google.com/blog/products/databases/6-database-trends-to-watch

How does Anthos simplify hybrid & multicloud deployments?

Most enterprises have applications in disparate locations—in their own data centers, in multiple public clouds, and at the edge. These apps run on different proprietary technology stacks, which reduces developer velocity, wastes computing resources, and hinders scalability. How can you consistently secure and operate existing apps, while developing and deploying new apps across hybrid and multicloud environments? How can you get centralized visibility and management of the resources? Well, that is why Anthos exists!

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/how-does-anthos-simplify-hybrid-multicloud-deployments

Automate your budgeting with the Billing Budgets API

Budgets are ideal for visibility into your costs but they can become tedious to manually update. Using the Billing Budgets API you can automate updates and changes with your custom business logic.

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/automate-your-budgeting-billing-budgets-api

Amazon RDS for PostgreSQL Integrates with AWS Lambda

You can now invoke Lambda functions from stored procedures and user-defined functions inside your PostgreSQL database using the aws_lambda PostgreSQL extension provided with RDS for PostgreSQL. You can choose to execute the lambda synchronously or asynchronously. Also you get access to the execution log, the error and the returned result for further processing. This probably also means you can call Lambda function from triggers on tables to respond to data changes!

Read more:
https://aws.amazon.com/about-aws/whats-new/2021/04/amazon-rds-postgresql-integrates-aws-lambda/

AWS SAM CLI now supports AWS CDK applications (public preview)

Some features of SAM now become available for CDK developers. Local testing before synthesizing of Lambda functions and API endpoints. And building your Lambda resources into deployable zip files. All this while still being able to use the CDK CLI for creating, modifying, and deploying CDK applications.

Read more:
https://aws.amazon.com/about-aws/whats-new/2021/04/aws-sam-cli-supports-aws-sdk-applications-public-preview/

AWS CloudFront Functions

It is now possible to manipulate requests (viewer request and viewer response) with new CloudFront Functions. These are Javascript functions that execute within 1 millisecond on the outer edge locations (218 in stead of 13!). You can’t do really crazy stuff, most suitable for header manipulations, redirects, url rewrites and token validations. The cost of using functions is 1/6 of the Lambda@Edge price!

Read more:
https://aws.amazon.com/blogs/aws/introducing-cloudfront-functions-run-your-code-at-the-edge-with-low-latency-at-any-scale/

Amazon EC2 Auto Scaling introduces Warm Pools

You can now keep some pre-initialized stopped instances handy to quickly start them when autoscaling needs them. This is much cheaper then keeping live instances ready. Starting these pre-initialized is then a matter of 10s of seconds in stead of minutes.

Read more:
https://aws.amazon.com/about-aws/whats-new/2021/04/amazon-ec2-auto-scaling-introduces-warm-pools-accelerate-scale-out-while-saving-money/

Credits: Header image by Luca Cavallin on nylavak.com

Did you even wonder how to connect to CloudSQL with IAM authentication? Since this year, CloudSQL for PostgreSQL allows IAM users and IAM service accounts to login as a database user. In this short blog I will show you how to do this using Terraform.

(more…)

In this blog I introduce the Renovate bot, a great tool to automate the update application dependencies in your source code.

(more…)

I was recently tasked with deploying ElasticSearch on GKE using Terraform and Helm, and doing so in most readable way possible. I wasn’t very familiar with Helm before, so I did some research to find approach that would fulfill the requirements. In this post I will share with you the Terraform configuration I used to achieve a successful deployment.

What is Helm?

Helm is, at its most basic, a templating engine to help you define, install, and upgrade applications running on Kubernetes. Using Helm, you can leverage its Charts feature, which are simply Kubernetes YAML configuration files (that can be further configured and extended) combined into a single package that can be used to deploy applications on a Kubernetes cluster. To be able to use Helm via Terraform, we need to define the corresponding provider and pass the credentials needed to connect to the GKE cluster.

provider "helm" {
  kubernetes {
    token                  = data.google_client_config.client.access_token
    host                   = data.google_container_cluster.gke.endpoint
    cluster_ca_certificate = base64decode(data.google_container_cluster.gke.master_auth[0].cluster_ca_certificate)
  }
}

Terraform configuration

I am defining an helm_release resource with Terraform, which will deploy the ElasticSearch cluster when applied. Since I am using an Helm chart for the cluster, doing so is incredibly easy. All I had to do was tell Helm the name of the chart to use and where it is located (repository), along with the version of ElasticSearch that I would like to use.
With the set blocks instead, I can override the default values from the template: this makes it easy to select an appropriate storage class, amount of storage and in general any other piece of configuration that can be changed (you have to refer to the documentation of the chart itself to see which values can be overridden), directly from Terraform.

resource "helm_release" "elasticsearch" {
  name       = "elasticsearch"
  repository = "https://helm.elastic.co"
  chart      = "elasticsearch"
  version    = "6.8.14"
  timeout    = 900

  set {
    name  = "volumeClaimTemplate.storageClassName"
    value = "elasticsearch-ssd"
  }

  set {
    name  = "volumeClaimTemplate.resources.requests.storage"
    value = "5Gi"
  }

  set {
    name  = "imageTag"
    value = "6.8.14"
  }
}

Creating a new storage class

I then had to provision a new storage class, which will be used by the ElasticSearch cluster to store data. The configuration below sets up the SSD (SSD is recommended for such purpose, since it’s faster than a regular HDD) persistent disk that I referenced in the main configuration above.

resource "kubernetes_storage_class" "elasticsearch_ssd" {
  metadata {
    name = "elasticsearch-ssd"
  }
  storage_provisioner = "kubernetes.io/gce-pd"
  reclaim_policy      = "Retain"
  parameters = {
    type = "pd-ssd"
  }
  allow_volume_expansion = true
}

Summary

In this blog post I have shown you how to deploy ElasticSearch on GKE using Terraform and Helm. The required configuration is simple and very readable, lowering the barrier to handling all of your infrastructure via Terraform, rather than, for example, using Cloud Marketplace, managed services, or other custom solutions.

Credits: Header image by Luca Cavallin on nylavak.com

When I test new ideas and features, I often rack up accidental cloud cost on Google Cloud Platform. I forget to delete things I created, and end up paying for resources that I kept running as a result.

(more…)

When you want to create a Python command line utility for Google Cloud Platform, it would be awesome if you could use the active gcloud credentials in Python. Unfortunately, the Google Cloud Client libraries do not support using the gcloud credentials. In this blog, I will present a small Python library which you can use to do just that.

How use the gcloud credentials in Python

It is really simple. You install the package gcloud-config-helper and call the default function, as shown below:

import gcloud_config_helper
credentials, project = gcloud_config_helper.default()

You pass the credentials to a service client as follows:

c = compute_v1.InstancesClient(credentials=credentials)
for zone, instances in c.aggregated_list(request={"project": project}):
    for instance in instances.instances:
        print(f'found {instance.name} in zone {zone}')

That is all there is to it! Check out the complete example of using the gcloud configured credentials.

How does it work?

The library executes the command gcloud config config-helper. This commands provides authentication and configuration data to external tools. It returns an access token, an id token, the name of the active configuration and all associated configuration properties as show below:

    configuration:
      active_configuration: playground
      properties:
        core:
          account: markvanholsteijn@binx.io
          project: playground
        ...
    credential:
      access_token: ya12.YHYeGSG8flksArMeVRXsQB4HFQ8aodXiGdBgfEdznaVuAymcBGHS6pZSp7RqBMjSzHgET08BmH3TntQDOteVPIQWZNJmiXZDr1i99ELRqDxDAP8Jk1RFu1xew7XKeQTOTnm22AGDh28pUEHXVaXtRN8GZ4xHbOoxrTt7yBG3R7ff9ajGVYHYeGSG8flksArMeVRXsQB4HFQ8aodXiGdBgfEdznaVuAymcBGHS6pZSp7RqBMjSzHgET08BmH3TntQDOteVPIQWZNJmiXZDr1i99ELRqDxDAP8Jk1RFu1xew7XKeQTOTnm22AGDh28pUEHXVaXtRN8GZ4xHbOoxrTt7yBG3R7ff9ajGV
      id_token: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJodHRwczovL2FjY291bnRzLmdvb2dsZS5jb20iLCJhenAiOiI5OTk5OTk5OTk5OS5hcHBzLmdvb2dsZXVzZXJjb250ZW50LmNvbSIsImF1ZCI6Ijk5OTk5OTk5OTk5LmFwcHMuZ29vZ2xldXNlcmNvbnRlbnQuY29tIiwic3ViIjoiMTExMTExMTExMTEyMjIyMjIyMjIyMjIiLCJoZCI6ImJpbnguaW8iLCJlbWFpbCI6Im1hcmt2YW5ob2xzdGVpam5AYmlueC5pbyIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJhdF9oYXNoIjoic2Rmc2prZGZqc2tkZnNrZGZqcyIsImlhdCI6MTU4ODcwMTY4MywiZXhwIjoxNTg4NzA1MjgzfQ.-iRKFf69ImE93bvUGBxn3Fa5aPBjhyzeWfLzuaNdIGI
      token_expiry: '2020-05-05T19:01:22Z'

When the token expires, the library will call the helper again to refresh it. Note that Google is unsure whether the config-helper is a good thing. If you read gcloud config config-helper --help, you will notice the following sentence:

This command is an internal implementation detail and may change or disappear without notice.

For the development of command line utilities which integrate into the Google Cloud SDK ecosystem, it would be really handy if Google would provide an official way to obtain the active gcloud configuration and credentials.

conclusion

With the help of this library, it is possible to create a command line utility in Python for the Google Cloud Platform using the gcloud credentials in Python. It is unfortunate that Google marks the config-helper as a volatile interface. Given the simplicity of the interface, I trust this library will be able to deal with any future changes. It would be even better, if Google would provide official support.

We also created a library for authenticating with gcloud credentials in Go.

Image by Kerstin Riemer from Pixabay

What is the next big thing? That’s Cloud. But what is the value of cloud training? And why should you train with an authorized training partner?

In this video and blog post, Max Driessen, chief of cloud training, and Martijn van de Grift, cloud consultant and authorized instructor for AWS and GCP, discuss the importance of cloud training and reasons to train with an ATP.

How important is cloud training?

“That depends on the phase your organization is in, and on the people within your organization”, Max starts.

“Generally speaking, proper cloud training is a great enabler to speed up development and go-to-market, and to implement best-practices such as cost optimization, performance efficiency, and security".

“Organizations usually start their cloud journey by implementing virtual machines or migrating small databases. That isn’t really different from what their engineers have been doing all along”, Martijn adds.

“What is essential in this phase, is that the management defines a vision on cloud and a roadmap. Managers need to become familiar with operating models in the cloud, different cost models and cost allocation".

“As organizations advance towards using the cloud-native building blocks, new challenges arise. Organizations in this phase can really benefit from fundamental training”, says Martijn.

“If a company is already really progressing well with the cloud, they move past the need for the fundamentals towards a potential to dive deep with specific technical cloud training to speed up their development".

What kind of training needs do you see?

Without a blink, Max sums up: “I see three different general training needs: Cloud Awareness, Cloud Transformation, Cloud Skills".

“Cloud Awareness training is for companies that are setting their first steps with the cloud. This training is ideal for managers, project leads, product owners, basically anyone within an organization who needs to get a better grip on the cloud. To develop an understanding of the capabilities, the capacity, the return on investment of cloud even," Max explains.

“Cloud Transformation is for companies that are already in a hybrid or a migration process, who need support and skills within that process of transformation". Max continues. “They will learn about cloud target operating models, defining a business case around cloud, but also, they will learn about the organizational impact of a cloud transformation".

Martijn says: ”Cloud Skills are off-the-shelf training from the curriculum of AWS, GCP, or Azure. These learning paths focus on roles like architects, developers, security or data specialists, and even help you to get prepared to take a certification exam. Each program in the curriculum of AWS, GCP, or Azure, is great preparation for an exam".

“If you want to take the exam, that is something you have to decide on your own, but the training will definitely help you to develop the right competencies", Max adds.

Martijn concludes: “Obviously, you also need to gain practical experience. That is why it’s so useful to have an instructor who has a couple of years of practical experience under his belt".

Why train with Binx?

“Three reasons stand out: First of all, we make sure that each individual receives the best training experience possible by meeting you where you are. Secondly, we combine our training activities with high-end consultancy, so we have first-hand experience with cloud projects. Thirdly, but certainly not the least: we are an authorized training partner for AWS, and GCP," Max says with a smile.

But how do you define the training need and the right training? “If you identify a training need, but you are not really sure on which topic you need training, we have an assessment for you. We can help to identify the knowledge gaps, on which domains. We can help you to fill in the blanks. An assessment could also be a questionnaire, where we have a conversation with a company, to check where they are and to assess where we can be of any help," Max explains.

Martijn has been working as a cloud engineer for some years now. Does that provide added value when delivering training?

“Being not only a consultancy firm but also a training partner, is definitely a big plus. We are working with clients four or five days a week, helping the customers to solve their problems. And on top of this, we are also delivering training".

"We know first-hand what the challenges are that customers are typically facing”, explains Martijn. “And we are also up to date in all the courses and materials. During the training courses, you work together with the instructor. There is plenty of time to ask the instructor. The instructor will tap into his experience and knowledge to explain all the topics. This is definitely a big benefit over an online course".

“Being an authorized training partner with AWS and GCP is very special. Especially if you are authorized in for multiple cloud service providers. That is something that rarely happens. The bar is set high. For example, scoring a 9 out of 10 in feedback for a training", Max says.

Martijn adds: “To become an authorized trainer, you also have to meet a high standard. You have to know everything about the cloud providers and their services. About Amazon, Google, and Azure. Instructors are tested every year by the cloud service provider itself".

Can you share some of your personal training experiences?

Max starts to smile as he recalls his first cloud training a while ago: “In the meantime, I joined three courses, and it has been quite a journey. It was quite challenging at the start, but having a good trainer really made the difference. In my case, I was lucky to have Martijn as the trainer."

Martijn says “My favorite training is the Cloud Architect Professional training. This training really gives a global overview of everything a cloud provider has to offer you. It’s actually more than the cloud from a developer point-of-view, or only from a security point-of-view. It really gives you an overview of the services and how you can use them properly."

Interested to learn more about the benefits of training for you or your organization? Get in touch with Max Driessen to discuss the possibilities.

We have read and selected a few articles and other resources about the latest developments around cloud technology so you don’t have to. Read further and keep yourself up-to-date in five minutes!

Inventory management with BigQuery and Cloud Run

Cloud Run is often used just as a way of hosting websites, but there’s so much more you can do with it. In this blog, Aja Hammerly (Developer Advocate) will show us how to use Cloud Run and BigQuery together to create an inventory management system. The example uses a subset of the Iowa Liquor Control Board data set to create a smaller inventory file for a fictional store. 

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/inventory-management-bigquery-and-cloud-run

3 common serverless patterns to build with Workflows

The Workflows orchestration and automation service is a serverless product designed to orchestrate work across Google Cloud APIs as well as any HTTP-based API available on the internet, and it has recently reached General Availability. At the same time, Workflows has been updated with a preview of Connectors, which provide seamless integration with other Google Cloud products to design common architecture patterns that can help you build advanced serverless applications. 

Read more:
https://cloud.google.com/blog/products/application-development/building-serverless-apps-with-workflows-and-connectors

Google Cloud offers lots of products to support a wide variety of use cases. To make it easier to orient yourself, Google has created a set of resources that makes it easy to familiarize yourself with the Google Cloud ecosystem. You can use these resources to quickly get up to speed on different products and choose those that you’re most interested in for a deeper dive into documentation and other available resources. 

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/back-popular-demand-google-cloud-products-4-words-or-less-2021-edition

Node.js 14.x runtime now available in AWS Lambdar

While it is still OK to use node.js 12.x until the end of 2021, version 14.x brings a few things that make the life of a developer a lot nicer. Most of the changes come from the newly upgraded Javascript engine: v8.1 on which nodejs runs! New features include nullish coalescing operator, nullish assignment and optional chaining.

Read more:
https://aws.amazon.com/blogs/compute/node-js-14-x-runtime-now-available-in-aws-lambda/

Cost savings on AWS

AWS recently introduced the “Compute Savings Plan” for Lambda. It allows for 12% to 17% discount on the expenses. So if you spend say $730 per month ($1 per hour) on lambda you can commit for $640 per month ($0.88 per hour). Leading to yearly savings of $1080. ($1 per hour is a LOT of Lambda usage though so probably your actual situation will not benefit this much).

Also it is now possible to save costs on CloudFront outbound traffic. In as use case where the traffic amounts to $50 per day the savings are around $800 per month!

Read more:
https://aws.amazon.com/blogs/aws/savings-plan-update-save-up-to-17-on-your-lambda-workloads/ and https://aws.amazon.com/about-aws/whats-new/2021/02/introducing-amazon-cloudfront-security-savings-bundle/

AWS DynamoDB audit logging and monitoring with CloudTrail

DynamoDB change logging was usually done through streams. Now it is possible to do logging with CloudTrail data events. This also allows for access logging including the queries used to access the data. These Cloudwatch data events also support access logging for S3, Lambda (invokes) and Managed Blockchain.

Read more:
https://aws.amazon.com/blogs/database/amazon-dynamodb-now-supports-audit-logging-and-monitoring-using-aws-cloudtrail/

S3 Object Lambda

A new type of Lambda can be attached to an S3 bucket that allows you to change the response for get requests. The Lambda is attached to an S3 Control Access Point which will lead the requests to the Lambda. Inside the Lambda you you can change header, body and status code based on your requirements. Use cases could be: resizing images, partially hiding some data based on requester identity or adding headers from another datasource.

https://aws.amazon.com/blogs/aws/introducing-amazon-s3-object-lambda-use-your-code-to-process-data-as-it-is-being-retrieved-from-s3/

Credits: Header image by Luca Cavallin on nylavak.com

When building custom images for Google Cloud Platform using Hashicorp Packer, you either specify a fixed, or latest for the Google source image version. When specifying a fixed version, you run the risk of building with a deprecated image. If you use latest, your build may introduce changes unknowingly. As we adhere to the principle to keep everything under version control, we created a utility which ensures that your packer template is always referring to the latest, explicit version of an image.

(more…)

Since version 0.13 terraform supports the notion of a provider registry. You can build and publish your own provider in the public registry. At the moment, there is no private registry implementation available. In this blog, I will show you how to create a private terraform provider registry on Google Cloud Storage. A utility will generate all the required documents to create a static registry.

(more…)

At a current customer, we’re using GCP’s IAP tunnel feature to connect to the VMs.
When deploying fluentbit packages on existing servers for this customer, I decided it would save some time if I would make an Ansible playbook for this job.
As I only have access to them through IAP, I ran into a problem; Ansible does not have an option to use gcloud compute ssh as a connection type.

It turns out, you do have the option to override the actual ssh executable used by Ansible. With some help from this post, and some custom changes, I was able to run my playbook on the GCP servers.

Below you will find the configurations used.

ansible.cfg

contents:

[inventory]
enable_plugins = gcp_compute

[defaults]
inventory = misc/inventory.gcp.yml
interpreter_python = /usr/bin/python

[ssh_connection]
# Enabling pipelining reduces the number of SSH operations required
# to execute a module on the remote server.
# This can result in a significant performance improvement 
# when enabled.
pipelining = True
scp_if_ssh = False
ssh_executable = misc/gcp-ssh-wrapper.sh
ssh_args = None
# Tell ansible to use SCP for file transfers when connection is set to SSH
scp_if_ssh = True
scp_executable = misc/gcp-scp-wrapper.sh

First we tell Ansible that we want to use the gcp_compute plugin for our inventory.
Then we will point Ansible to our inventory configuration file from which the contents can be found below.

The ssh_connection configuration allows us to use gcloud compute ssh/scp commands for our remote connections.

misc/inventory.gcp.yml

contents:

plugin: gcp_compute
projects:
  - my-project
auth_kind: application
keyed_groups:
  - key: labels
    prefix: label
  - key: zone
    prefix: zone
  - key: (tags.items|list)
    prefix: tag
groups:
  gke          : "'gke' in name"
compose:
  # set the ansible_host variable to connect with the private IP address without changing the hostname
  ansible_host: name

This will enable automatic inventory of all the compute instances running in the my-project GCP project.
Groups will be automatically generated based on the given keyed_groups configuration and in addition I’ve added a gke group based on the VM’s name.
Setting the Ansible_host to the name will make sure our gcloud ssh command will work. Otherwise Ansible will pass you the instance IP address.

misc/gcp-ssh-wrapper.sh

contents:

#!/bin/bash
# This is a wrapper script allowing to use GCP's IAP SSH option to connect
# to our servers.

# Ansible passes a large number of SSH parameters along with the hostname as the
# second to last argument and the command as the last. We will pop the last two
# arguments off of the list and then pass all of the other SSH flags through
# without modification:
host="${@: -2: 1}"
cmd="${@: -1: 1}"

# Unfortunately ansible has hardcoded ssh options, so we need to filter these out
# It's an ugly hack, but for now we'll only accept the options starting with '--'
declare -a opts
for ssh_arg in "${@: 1: $# -3}" ; do
        if [[ "${ssh_arg}" == --* ]] ; then
                opts+="${ssh_arg} "
        fi
done

exec gcloud compute ssh $opts "${host}" -- -C "${cmd}"

Ansible will call this script for all remote commands when the connection is set to ssh

misc/gcp-scp-wrapper.sh

contents:

#!/bin/bash
# This is a wrapper script allowing to use GCP's IAP option to connect
# to our servers.

# Ansible passes a large number of SSH parameters along with the hostname as the
# second to last argument and the command as the last. We will pop the last two
# arguments off of the list and then pass all of the other SSH flags through
# without modification:
host="${@: -2: 1}"
cmd="${@: -1: 1}"

# Unfortunately ansible has hardcoded scp options, so we need to filter these out
# It's an ugly hack, but for now we'll only accept the options starting with '--'
declare -a opts
for scp_arg in "${@: 1: $# -3}" ; do
        if [[ "${scp_arg}" == --* ]] ; then
                opts+="${scp_arg} "
        fi
done

# Remove [] around our host, as gcloud scp doesn't understand this syntax
cmd=`echo "${cmd}" | tr -d []`

exec gcloud compute scp $opts "${host}" "${cmd}"

This script will be called by Ansible for all the copy tasks when the connection is set to ssh

group_vars/all.yml

contents:

---
ansible_ssh_args: --tunnel-through-iap --zone={{ zone }} --no-user-output-enabled --quiet
ansible_scp_extra_args: --tunnel-through-iap --zone={{ zone }} --quiet

Passing the ssh and scp args through the group-vars makes it possible for us to set the zone to the VM’s zone already known through Ansible’s inventory.
Without specifying the zone, gcloud will throw the following error:

ERROR: (gcloud.compute.ssh) Underspecified resource [streaming-portal]. Specify the [--zone] flag.

Hopefully this post will help anyone that ran into the same issue!

When you create a Google service account key file for an external system, the private key has to
be transported. In addition, the generated key is valid until January 1st in
the year 10000. In this blog I will show you, how an external system can identity itself as the
service account without exposing the private key.

(more…)

I recently had to optimize the performance of a PHP-based API on Cloud Run. After a performance test, we discovered that the API became very slow when we put some serious load on it (with response times exceeding 10 seconds). In this post you’ll learn what changes I made to get that down to a stable 100ms.

The API uses PHP 7.4, Laravel 8.0 and MySQL on Cloud SQL (the managed database on Google Cloud). The API needs to handle at least 10,000 concurrent users. The container image we deploy to Cloud Run has nginx and PHP-FPM.

This is what I did to improve the response times. (Don’t worry if not everything makes sense here, I’ll explain everything).

  • Matching the number of PHP-FPM workers to the maximum concurrency setting on Cloud Run.
  • Configuring OPcache (it compiles and caches PHP scripts)
  • Improving composer auto-loading settings
  • Laravel-specific optimizations including caching routes, views, events and using API resources

Matching Concurrency and Workers

The application uses nginx and PHP-FPM, which is a process manager for PHP. PHP is single threaded, which means that one process can handle one (exactly one) request at the same time. PHP-FPM keeps a pool of PHP workers (a worker is a process) ready to serve requests and adds more if the demand increases.
It’s a good practice to limit the maximum size of the PHP-FPM worker pool, to make sure your resource usage (CPU and memory) is predictable.

To configure PHP-FPM for maximum performance, I first set the process manager type for PHP-FPM to static, so that the specified number of workers are running at all times and waiting to handle requests. I did this by copying a custom configuration file to the application’s container and configuring the environment so that these options will be picked up by PHP-FPM (you must copy the configuration where it is expected, in my case, into /usr/local/etc/php-fpm.d/). The settings I needed are:

pm = static
pm.max_children = 10

However, if you set a limit, and more requests come to a server than the pool can handle, requests start to queue, which increases the response time of those requests:

Nginx and php-fpm request model

Limiting Concurrent Requests on Cloud Run

To avoid request queuing in nginx, you’ll need to limit the number of requests Cloud Run sends to your container at the same time.

Cloud Run uses request-based autoscaling. This means it limits the amount of concurrent requests it sends to a container, and adds more containers if all containers are at their limit. You can change that limit with the concurrency setting. I set it to 10, which I determined is the maximum number of concurrent requests a container with 1GB of memory and 1vCPU can take for with this application.

Cloud Run concurrent requests with Nginx and php-fpm

You really want to make sure Cloud Run’s concurrency setting matches the maximum number of PHP-FPM workers! For example, if Cloud Run sends 100 concurrent requests to a container before adding more containers, and you configured your PHP-FPM to start only 10 workers, you will see a lot of requests queuing.

If these tweaks aren’t enough to reach the desired performance, check the Cloud Run metrics to see what the actual utilization percentages are. You might have to change the amount of memory and vCPUs available to the container. The downside of this optimization is that more containers will be running due to lower concurrency, resulting in higher costs. I also noticed temporary delays when new instances are starting up, but this normalizes over time.

Configuring OPCache

OPCache is a default PHP extension that caches the compiled scripts in memory, improving response times dramatically. I enabled and tweaked OPCache settings by adding the extension’s options to a custom php.ini file (in my case, I put it in the /usr/local/etc/php/conf.d/ directory). The following is a generic configuration that you can easily reuse, and you can refer to the documentation for the details about every option.

opcache.enable=1
opcache.memory_consumption=128
opcache.interned_strings_buffer=64
opcache.max_accelerated_files=32531
opcache.validate_timestamps=0
opcache.save_comments=1
opcache.fast_shutdown=0

Optimizing Composer

Composer is a dependency manager for PHP. It lets you specify the libraries your app needs, and downloads them for you to a directory. It also generates an autoload configuration file, which maps import paths to files.
If you pass the --optimize-autoloader flag to composer, it will generate this file only once, and doesn’t dynamically update it if you add new code. While that is convenient in development (your changes show up immediately), in production it can make your code really slow.

You can optimize Composer’s autoloader passing the --optimize-autoloader flag like this:

composer install --optimize-autoloader --no-dev

Laravel-specific optimizations

The application I optimized is built with Laravel, which provides a number of tools that can help improving the performance of the API. Here’s what I did on top of the other tweaks to get the response times below 100ms.

  • I have leveraged Laravel’s built-in caching features during builds to reduce start-up times. There are no downsides to these tweaks, except that you won’t be able to use closure-defined routes (they can’t be cached). You can cache views, events and routes with these commands:

    php artisan view:cache
    php artisan event:cache
    php artisan route:cache

    Avoid running php artisan config:cache since Laravel ignores environment variables if you cache the configuration.

  • Using Laravel API Resources further improves the response times of your application. This has proven to be much faster than having the framework automatically convert single objects and collections to JSON.

Summary

In this blog, I shared with you what I learned from optimizing the performance of a PHP-based API on Cloud Run. All of the tweaks together helped cut response times to one tenth of the original result, and I think the most impact was made by matching concurrency and PHP-FPM workers (if you’re in a hurry, do only this). Watching the application metrics has been fundamental throughout the performance testing phase, just as inspecting Cloud Run logs after each change.

If your application still shows poor performance after these changes, there are other tweaks you can make to improve response times, which I haven’t discussed here.

  • Increase PHP memory limits if needed
  • Check MySQL for slow queries (often due to missing indexes)
  • Cache the responses with a CDN
  • Migrate to PHP 8.0 (up to 3x faster)

Would you like to learn more about Google Cloud Run? Check out this book by our own Wietse Venema.

Credits: Header image by Serey Kim on Unsplash

We have read and selected a few articles and other resources about the latest developments around cloud technology so you don’t have to. Read further and keep yourself up-to-date in five minutes!

Introducing GKE Autopilot: a revolution in managed Kubernetes

GKE Autopilot manages the infrastructure, letting you focus on your software. Autopilot mode automatically applies industry best practices and can eliminate all node management operations, maximizing your cluster efficiency (you will not be charged for idle nodes), helping to provide a stronger security posture and ultimately simplifying Kubernetes.

Read more:
https://cloud.google.com/blog/products/containers-kubernetes/introducing-gke-autopilot

What is Amazon Detective?

Amazon Detective makes it easy to analyze, investigate, and identify the root cause of potential security issues or suspicious activities. It automatically collects log data from your AWS resources and uses machine learning to build a linked set of data that enables you to easily conduct faster and more efficient security investigations.

Watch video:
https://www.youtube.com/watch?v=84BhFGIxIqg

Introduction to Amazon Timestream –Time Series Database

Amazon Timestream is a serverless time series database service for IoT and operational applications that makes it easy to store and analyze trillions of events per day. Amazon Timestream saves you time and cost in by keeping recent data in memory and moving historical data to a cost optimized storage tier.

Watch video:
https://www.youtube.com/watch?v=IsmhOkimHyI

What are my hybrid and multicloud deployment options with Anthos?

Anthos is a managed application platform that extends Google Cloud services and engineering practices to your environments so you can modernize apps faster. With Anthos, you can build enterprise-grade containerized applications with managed Kubernetes on Google Cloud, on-premises, and other cloud providers.

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/what-are-my-hybrid-and-multicloud-deployment-options-anthos

Accelerate data science workflows with Looker

Looker is a modern business intelligence (BI) and analytics platform that is now a part of Google Cloud. It’s a full-fledged data application and visualization platform and it allows users to curate and publish data, integrating with a wide range of endpoints in many different formats ranging from CSV, JSON, Excel, files to SaaS and in-house built custom applications.

Read more:
https://cloud.google.com/blog/topics/training-certifications/how-to-use-looker-on-google-cloud-for-data-governance

While doing Cloud Native migrations, we depend on container images a lot. In this mini-blog I will present you with the
simplest Google Cloud Build config file that will build both snapshot and release versions at the same time.

(more…)

When you work with AWS ECS and you apply the least-privileged security principle, you create an IAM role for each task. While developing the software, you may want to test the role too. However, this is not easily done. In this blog we will show you how to can be done in the official way, and in a slightly more alternative way with the iam-sudo utility for development purposes.

(more…)

In the Xebia First Candidates restaurant, Xebia business unit managers share what excites them about working for their respective units.

Love is like a cloud migration: Thrilling at the start, but once you get going, you never want to go back.

Smooth Operations

Besides being an avid gamer, Elaine Versloot is also the chief of Operations at cloud boutique Binx.io. Here, she focuses on turning the dynamics of cloud migration and modernization into a smooth operation.

We are looking for people with a passion for cloud. Developing cloud-native applications should be in the blood. We find it just as important that individuals are fun to be around, the social aspect is also very important.

When someone found his or her expertise and can talk about this with passion. I love that! Nothing gets me going more than that excitement in someone’s eye.

Go On a Blind Date With Elaine

Interested to learn more about the way Binx approaches cloud?

Go on a blind business date with Elaine

I recently passed the Google Cloud Associate Cloud Engineer certification exam. This post describes what the exam is about, how I prepared (with links to useful resources), and the strategies I used to answer the questions in under two minutes per question (skip down to Answering Strategies if that’s just what you want to know).

If you’re reading this before February 9, you can still join my webinar! I’ll help you put together an actionable roadmap to pass the certification exam and share more insights.

What’s an Associate Cloud Engineer?

First, some background if you don’t know what this is about, and are trying to figure out if you should take the certification. The Associate Cloud Engineer certification is one of the most technical Google Cloud certifications: I think it is even harder than the Professional Cloud Architect exam I’m currently preparing for. Nevertheless, this is the first stepping stone into Google Cloud, making it an obvious choice to get started if you are a software engineer, like I am.

Google Cloud defines no prerequisites for this exam, but recommends a minimum of six months hands-on experience on the platform. The topics covered in the certification are:

  • Projects, Billing, IAM
  • Compute Engine, App Engine and Kubernetes Engine
  • VPC and Networking
  • Cloud Storage
  • Databases (included are for example Cloud SQL, Memorystore, and Firestore)
  • Operations Suite (Cloud Logging and other services). You might know this as Stackdriver.

The next step in the learning journey after the Associate Cloud Engineer exam is the Professional Cloud Architect certification. From there, you can choose to specialize in Data and Machine Learning Engineering, Cloud Development and DevOps, or Security and Networking. (Check out the certification paths here).

Why Did I Choose Google Cloud?

Google Cloud strikes me as the most developer-friendly cloud provider. At the same time, it is also the smallest of the three main players but it is growing fast, meaning lots of room for new experts. I like the certification program because it is role-based (instead of product-centered), preparing not just for this provider, but the position itself.

About Me

I want to tell you a little bit about me, so you know how to interpret my advice. I have been a Software Engineer for the past 8+ years, doing mostly full-stack development. The ability to work on what “powers” the internet and an interest for what’s under the hood of software systems brought me to the cloud. Furthermore, the managed services provided by cloud providers make it easy to add all sorts of features into applications, from sending emails to advanced machine learning – these days nothing is stopping you from starting your next company from the comfort of your bedroom.

I grew up in Italy and moved to the Netherlands a few years ago, where I now live with my wife and cat. When I am not working on clouds and other atmospheric phenomena, I enjoy photography and cycling.

Feel free to connect with me on LinkedIn or send me an email at lucacavallin@binx.io!

What Does the Exam Look Like?

This is how a certification exam works: you have two hours to answer 50 multiple choice and multiple selection questions. That means you’ll have roughly two minutes per question, with 20 minutes left at the end to review your answers.

The exam can be taken remotely or in person, costs 125 USD (roughly 100 EUR), and can be retaken after a cool down period if you fail. The certification is valid for two years (you’ll have to take the full exam again to recertify.

Exam Guide

Google publishes an exam guide for every certification. It details the topics of the certification, providing details of what you need to know for each of the areas.

I used the exam guide to track my progress while studying. You can make a copy of it and use color coding to highlight the topic you feel confident about and those that need more of your attention.

Learning resources

There are many books and online courses you can take to prepare for the exam, some more effective than others. For most people, a combination of written and audio-visual material works best. These are the resources that helped me best (and why):

  • Official Google Cloud Certified Associate Cloud Engineer Study Guide (book)
    • Most comprehensive resource
    • Questions after each chapter
    • Includes practice tests and flashcards
  • ACloudGuru (online course)
    • This is a good introduction (start with this course if you have no Google Cloud experience)
    • You also need the “Kubernetes Deep Dive” course
    • Hands-on labs
    • This course does not cover everything you need to know

Practice Tests

Practice tests are a key part of your preparation because they let you test your knowledge in a setting similar to the actual exam. If you finish the test, they provide you with a detailed explanation for each question, documenting the correct and wrong answers.

Take note of the topics that require attention and review the documentation accordingly. Once you consistently score at least 90% on practice tests, you are ready for the exam.

The Official Study Guide book provides review questions at the end of each chapter and an online portal with two practice tests. The ACloudGuru course also has a practice exam you can take, and you can find similar resources on Udemy.

Answering Strategies

If you do the math, you’ll see that you only have two minutes to answer every question, which is not much, given that each of them is quite lengthy. Here are the strategies I used when answering the questions.

  • Identify the core question
  • Review it carefully since a single word can make the difference
  • Eliminate answers that are wrong or in conflict with the question
  • Choose the cheapest and most secure option
  • Read the question again, keeping in mind the answer you chose

You can also mark answers during the test to come back to them later. While you don’t have much time left for review at the end, this practice will save you from over thinking and losing valuable time.

Taking the Remote Proctored Exam

If you decide to take the exam remotely, once you have arranged it, you will have to install a Sentinel tool provided by the testing authority, verify your identity and pass a number of checks.

TIP: The operator taking you through the process doesn’t talk, so you need to scroll the chat for their questions. For me, the chat didn’t auto-scroll, so it was a bit awkward at first.

Join My Webinar

If you’re reading this before February 9, you can still register for my webinar! I’ll share all my learnings. I would love to meet you and answer any questions! Make sure to register here.

Summary

In this post, I shared my experience preparing and taking the Google Cloud Associate Engineer Certification exam. Here are the four most important things to take away from this:

  • You can do this! If you focus your attention and take as many practice tests as possible, and carefully review your correct and incorrect answers, you’ll pass the exam.
  • I think the Official Study Guide is hands down the best resource to use for preparation.
  • It is very useful to have real-world experience and practice with gcloud CLI and Kubernetes.
  • During the exam, attention to detail is important. Read every question carefully and read the question again after choosing your answer.

Credits: Header image by KAL VISUALS on Unsplash

We have read and selected a few articles and other resources about the latest and most significant developments around cloud technology so you don’t have to. Read further and keep yourself up-to-date in five minutes!

Introducing WebSockets, HTTP/2 and gRPC bidirectional streams for Cloud Run

Support for streaming is an important part of building responsive, high-performance applications. With these capabilities, you can deploy new kinds of applications to Cloud Run that were not previously supported, while taking advantage of serverless infrastructure.

Read more:
https://cloud.google.com/blog/products/serverless/cloud-run-gets-websockets-http-2-and-grpc-bidirectional-streams

Amazon Location – Add Maps and Location Awareness to Your Applications

Amazon Location Service gives you access to maps and location-based services from multiple providers on an economical, pay-as-you-go basis. You can use it to display maps, validate addresses, turn an address into a location, track the movement of packages and devices, and more.

Read more:
https://aws.amazon.com/blogs/aws/amazon-location-add-maps-and-location-awareness-to-your-applications/

Eventarc brings eventing to Cloud Run and is now GA

Eventarc is a new eventing functionality that lets developers route events to Cloud Run services. Developers can focus on writing code to handle events, while Eventarc takes care of the details of event ingestion, delivery, security, observability, and error handling.

Read more:
https://cloud.google.com/blog/products/serverless/eventarc-is-ga

Lifecycle of a container on Cloud Run

Serverless platform Cloud Run runs and autoscales your container-based application. You can make the most of this platform when you understand the full container lifecycle and the possible state transitions within it. In this article on the Google Cloud blog, our very own Wietse Venema takes you through the states, from starting to stopped.

Read more:
https://cloud.google.com/blog/topics/developers-practitioners/lifecycle-container-cloud-run

Amazon Aurora supports PostgreSQL 12

Amazon Aurora with PostgreSQL compatibility now supports major version 12. PostgreSQL 12 includes better index management, improved partitioning capabilities, and the ability to execute JSON path queries.

Read more:
https://aws.amazon.com/about-aws/whats-new/2021/01/amazon-aurora-supports-postgresql-12/

Keeping your Docker container image references up-to-date

Binx.io released an open source utility which allows you to list, detect and repair outdated references to container images, so you always build the most up-to-date images.

Read more:
https://binx.io/blog/2021/01/30/how-to-keep-your-dockerfile-container-image-references-up-to-date/

Over the past few months, our Cloud Engineers have proven to be so much more than cloud technology specialists. Out of necessity, they have also taken on side jobs as teacher, entertainer and nanny, all from the comfort of their own homes.

Helping Binx Parents Through The Lockdown

We are lucky to have a very creative department that looks after the wellbeing of all people within the company. Overcoming the impossible, they have made it a habit to look at what is possible in every given circumstance. The Nanny-on-Tour initiative is the latest rabbit to come from their hats. The objective is to help all the parents at Binx to get through the lockdown as comfortable as possible by installing a dedicated pool of qualified nannies!

Turning A Pool of Nannies Into Reality

We sat down with HR Advisors Roosmarijn van Zessen and Véronique Reijinga to discuss this wonderful initiative.

"We received several phone calls from colleagues who are really struggling with working from home and taking care of their children at the same time. The power of Binx, as part of Xebia, is that we help each other out", Roosmarijn van Zessen shared.

Véronique added: "In the past, we had a big pool of students, often children of colleagues, working for our hospitality crew or assisting in other departments. We’ve asked them if they would be open to work for Xebia as a nanny, and many of them responded enthusiastically! A couple of hospitality colleagues have also indicated that they would like to participate in this project. Our Xebia Nanny pool was born!"

It’s a Match!

Almost immediately after announcing this initiative, the HR team received many positive responses. The project was launched last week, and already they’ve been able to match four colleagues with a nanny from the nanny network!

It is great to see how happily surprised they were we already found a nanny for them. We will get through the homeschooling and lockdown together!

Whenever you build a container image, chances are that you are using public images as a base. But how do you keep your image up-to-date with the latest releases? In this blog I will introduce a utility which will allow you to keep your Dockerfile container image references up-to-date.

(more…)

Did you know that, once you have authenticated using the Google Cloud Platform SDK, the credential is valid for all eternity? With the Google Cloud session control tool you can limit the validity to as little as an hour.

(more…)

Hi, my name is Max Driessen, since November 1st, I joined the Binx team as Chief Cloud Training. In this role, it is my ambition to empower professionals with the right cloud skills by offering the best training programs. I have a background in information technology but lack hands-on programming experience.

In my second week at Binx, my colleague Martijn van de Grift— cloud consultant and authorized trainer for both Google Cloud Platform as well as Amazon Web Services— organized a Google Cloud Fundamentals training.

“This is really something for you,” Martijn told me. “This way you not only learn more about Google Cloud, but you also experience what it is like to attend a training.”

I decided to jump right in to experience what it’s like to attend a Binx training first hand.

Before I realized it, there I was; attending an online GCP Fundamentals training, with Martijn as the trainer. Obviously, this is a fundamentals training, most of our training courses go way deeper into specific topics, but for me this felt as the gateway to a new world. Although I roughly knew what to expect, this did not make me feel less nervous. But, I had one goal: to learn more about GCP.

Stretching the Comfort Zone

At the start of the training, I felt a bit tense about what was to come. That feeling vanished quickly due to the informal start where every attendee introduced him or herself briefly and shared personal learning goals with the rest of the group.

Then it was time to get going. The training was laid out in a logical order. After an initial introduction to the GCP interface, we went under the hood with various functionalities of the platform. Martijn alternated between theory and Q & A. Besides, a substantial part of the training was reserved for breakout sessions where every participant, including myself, could try out the various services with Qwiklabs exercises. These Qwiklabs provide exercises where you follow a step-by-step plan to learn how to use a specific service. What resonated with me was the way the trainer went out of his way to support every individual attendee with the successful completion of the labs.

It was obvious from the way that Martijn shared additional information on top of the materials, that he knows his way around Google Cloud like no other. From his experience, Martijn also handed out dozens of practical tips about the right way to implement the services. This resulted in a positive vibe, which also encouraged me to veer outside of my comfort zone further and further.

What Ground Was Covered?

This GCP Fundamentals training introduced me to a broad range of cloud concepts and GCP services. Besides the general introduction, I also learned about Virtual Machines and different types of Storage. Think about topics like cloud storage, SQL, Datastore, Bigtable, Spanner. Of course, containers and Kubernetes were not overlooked. We also took a look at App Engine, Stackdriver and Infrastructure as Code. Finally, there was even some time left to cover the basics of BigQuery and Managed Machine Learning APIs.

At the end of the day, it felt like I had just made it through one full week of training. 😉

Applicability

As mentioned, I had no previous experience of working with cloud platforms. For the first topics this was no problem, but the hands-on labs will be easier to apply in practice for someone who has some software engineering experience. For me personally, that was not the objective of attending this course, I wanted to obtain a global understanding of GCP. That content and experience exceeded my initial goals.
The Most Effective Format
For me it was very valuable that all attendees introduced themselves at the beginning of the training. This made it easier later on to work together and to ask questions. By sharing their individual learning goals beforehand, the trainer was able to personalize the learning experience for everyone. To me, this was a big plus compared to individual online training without a live instructor.

I must admit that a full day of online training is very intense. Especially at the end of day, it becomes more difficult to digest all the information. That is why we have chose to offer our online courses in half days by default. This way, it is easier to stay attentive, while it also makes it easier to combine training with other obligations.

Max Recommends

I am looking forward to navigating the world of cloud together with you. One of my goals is to regularly recommend specific training topics. As this Google Cloud Fundamental training was my first, let’s start by sharing who I would recommend this training for. I recommend this Google Cloud Platform Fundamentals training to everyone who would like to make a start with GCP. The training offers a broad introduction to the various GCP services and lays the groundwork for official Google Cloud certifications.

If you have any questions, feel free to get in touch with me directly by sending me an email via maxdriessen@binx.io

Two years ago, I created a utility to copy AWS SSM parameters from one account to another. I published the utility to pypi.org, without writing a blog about it. As I found out that quite a number of people are using the utility, I decided to unveil it in this blog.

(more…)

Building Serverless Applications with Google Cloud Run

Wietse Venema (software engineer and trainer at Binx.io) published an O’Reilly book about Google Cloud Run: the fastest growing compute platform on Google Cloud. The book is ranked as the #1 new release in Software Development on Amazon.com! We’re super proud. Here’s what you need to know about the book:

Praise

I’ve been fortunate enough to be a part of the Google team that helped create Knative and bring Cloud Run to market. I’ve watched Cloud Run mature as a product over the years. I’ve onboarded thousands of customers and I wrote a framework to help Go developers build Cloud Run applications faster–and even I learned a thing or two from this book. What took me three years to learn, Wietse delivers in less than a dozen chapters.
— Kelsey Hightower, Principal Engineer at Google Cloud

About the Book

If you have experience building web applications on traditional infrastructure, this hands-on guide shows you how to get started with Cloud Run, a container-based serverless product on Google Cloud. Through the course of this book, you’ll learn how to deploy several example applications that highlight different parts of the serverless stack on Google Cloud. Combining practical examples with fundamentals, this book will appeal to developers who are early in their learning journey as well as experienced practitioners (learn what others say about the book).

Who this Book is For

If you build, maintain or deploy web applications, this book is for you. You might go by the title of a software engineer, a developer, system administrator, solution architect, or a cloud engineer. Wietse carefully balances hands-on demonstrations with deep dives into the fundamentals so that you’ll get value out of it whether you’re an aspiring, junior, or experienced developer.

What’s Next

You can do either one or of all of the following things:

How to Automate the Kritis Signer on Google Cloud

Google Binary Authorization allows you to control the images allowed on your Kubernetes cluster. You name the images or allow only images required to be signed off by trusted parties. Google documents a manual process of creating Kritis signer attestions. In this blog, I show you how to automate the Kritis signer, using Terraform and Google Cloud Run.

Cloud-based IT infrastructures and software solutions are not only disrupting IT, they disrupt the way organizations, and even industries, operate. The full extent of the cloud’s scope of influence stretches far beyond the traditional IT landscape. To name a few, cloud impacts IT management and the ways teams work together. Last but not least, it requires a new skill set for individuals to operate in this sphere. These are just a few areas where the cloud has had significant impact. The cloud can be seen as a framework. In this article, we explain how cloud technology—if it’s applied with the right mind set—helps teams accelerate development.

The Cloud Framework

The cloud computing model has destroyed the boundary between the world of hardware and the world of software: the data center has become software too. The cloud provider offers you a programming framework as a foundation on which you can build anything. You program a datacenter with all of the components: computers, networks, disks, load balancers. When you start it up, a virtual instance of your data center is started. With this model it becomes very easy to change your data center. Just change the code and restart the program!

What Is Cloud-Native

So, what does it mean to be cloud-native? Do you have to use all of the cloud provider’s specific tools and resources? No, not at all. Cloud-native means you build and run applications that exploit the cloud benefits—such as scalability, flexibility, high availability and ease of maintenance. You focus on how to create these applications and how to run them: not on where they run.

Five Advantages of Cloud-Native

Organizations that adopt the Cloud-native style of software development and delivery have significant advantages over the traditional way of software delivery. If you’re responsible for IT projects, these five cloud-native advantages will sound like music to your ears:

1. High speed of Delivery

The cloud offers loads of off-the-shelf features so engineers can focus on developing unique ones—in other words, innovation. Leveraging these features, coupled with an ability to test every change with ease, allows development teams to deliver functional software faster.

2. Predictable Processes

As everything is programmed and automated, the software delivery process becomes very predictable.

3. Strong Reliability

Cloud providers offer robust services with consistent performance and multi-regional availability, which reduces the number of potential errors

4. Pay for what you use

Instead of investing in hardware for your own data center, you only pay for what you use in the cloud. However, don’t forget to shut off the service when you don’t need it!

5. Disaster Recovery

In the cloud, everything is code. So, in the unlikely event of a disaster, it is super simple to reboot the infrastructure and applications. Just add the data, and you’re operational again.

The Impact of Cloud-Native on Teams

A cloud-native infrastructure offers developers a stable, flexible environment with high availability — a place where engineers can release new features faster and easier than ever before. Cloud platforms provide automated services to manage infrastructure and increase its availability and reliability. All of this requires different skills from the people who are responsible for the infrastructure and applications. So, how will cloud technology impact your organizational culture?

Say Goodbye to Ops As You Know It

System administrators have traditionally been responsible for installing, supporting, and maintaining servers and systems, as well as developing management scripts and troubleshooting environments. In the cloud, this is replaced by managed services, immutable infrastructure and self-healing architectures. In the cloud everything is automated and coded: from the applications, monitoring, the infrastructure and delivery processes.

Bring True Developer Skills

Cloud-native requires people who can design systems, write code, automated deployment processes and automate the monitoring of systems.

They also need to know how to adopt the best way of working. In the cloud, DevOps has become the standard framework for the software delivery life cycle. This framework oversees the entire cycle of planning, developing, using, and managing applications. Since the cloud removes most of the “Ops” work, it’s essential for organizations to amp up their internal development skills.

Embrace Changes to Your Environment

In the classical IT world, changes to an existing system was to be avoided as much as possible, as it might possibly break things. In the cloud-native world, you embraces changes to your environment, as they may expose errors in your design or application that you need to fix, so that the error will not occur when the change is released to production.

Follow the Cloud-Native Roadmap

To exploit its advantages, it’s helpful for developers to follow the cloud-native roadmap. This roadmap consists of six steps to build and run cloud-native environments.

An Example of a Cloud-Native Set-Up

Vereniging Coin - Binx Customer
COIN is an association founded by telecom providers taking care of the transition of customers between telecom providers. To reduce management costs and increase development speed, COIN turned to Binx. As a cloud-partner, Binx could help them take full control of their software development, by migrating their infrastructure and applications to the cloud.

Initially, the development team, led by Binx, containerized all applications. This made it possible to release and deploy a new version of features as soon as there was a commit—a dream come true for the configuration manager. Everything deployed was visible, including all the resources associated with it.

The team decided to deploy Amazon RDS for Oracle, a high-available Oracle service, since they could not easily migrate the existing Oracle database to an open-source relational database management system.

The organization was provided with the self-service reporting tool (Amazon QuickSight) to access the data. This allowed end-users to create their own reports, and the development team to stay focused on developing features instead.

Because all applications were containerized, the application deployment process was automated and standardized which improved both reliability and speed of deployment.

COIN adopted a service called CloudFormation to code the infrastructure, to make the environment 100% reproducible. Binx developed a large number of custom resources for features like automated password generation and infrastructure setup. Managed services automatically deploy via a SAAS-based Git service, so there’s no in-house installation at COIN.

Last but not least, Binx implemented business service level checks and monitoring to ensure that the team is alerted of any disturbance in the service offered to their end users. These Service-level indicators (SLI) measure how well the services are performing and objectives determine what acceptable levels of performance are. These SLIs are also used on a daily basis to improve the system. Event for small aberrations, the team executes root cause analysis to see if the problem can be designed out of the system.

The service level objectives are continuously monitored and every breach automatically alerts the team and the on-call engineer.

Becoming Cloud-Native: Create Your Starting Point

Now that you can see the cloud as a framework, its benefits over more traditional approaches should be clear. The next step is to adopt cloud-native ways of working. Can your organization view infrastructure as code and migrate workloads to the cloud? What skills are lacking? Can your workforce adopt a cloud-native mindset? By answering these questions, you create a starting point for your journey towards becoming cloud-native. Safe travels, we look forward to seeing you out there!

Urgent Future: Digital Culture

This article was featured in Urgent Future: Digital Culture. A trend report featuring various view points on the culture that makes digital organizations thrive in today’s economy. Download the trend report here

Before I can use a particular service, I have to enable the API in my Google project. Sometimes when I do this, more services are enabled than the one I specified. In this blog I will show you how to find service dependencies like this.

To see this in effect, I am going to enable the Cloud Functions service. First I will show you that enabling Cloud Functions, will actually enable six services in total. Then I will show you how you can list the dependencies. Finally, I will present you with a small utility and graph with all dependencies for you to play around with.

enabling Cloud Functions

So I am going to show you that enabling Cloud Functions is actually enabling multiple services. First, I am going to check the enabled services in my project:

gcloud services list

The output looks roughly like this:

NAME                              TITLE
bigquery.googleapis.com           BigQuery API
...
servicemanagement.googleapis.com  Service Management API
serviceusage.googleapis.com       Service Usage API

In this case, the Cloud Functions service is not listed.

enabling the serivce

To enable Cloud Functions in my project, I type:

$ gcloud services enable cloudfunctions
Operation "operations/acf.3170fc7d-dc07-476f-851b-ff0cc2b9d79f" finished successfully.

Now, when I check the list of enabled services again, the number of active services has increased with the following six services!

NAME                              TITLE
cloudfunctions.googleapis.com     Cloud Functions API
logging.googleapis.com            Cloud Logging API
pubsub.googleapis.com             Cloud Pub/Sub API
source.googleapis.com             Legacy Cloud Source Repositories API
storage-api.googleapis.com        Google Cloud Storage JSON API
storage-component.googleapis.com  Cloud Storage
...

Could I have predicted this before i enabled it? Yes, I could have..

listing all available services

To list all the available services, I use the following command:

gcloud services list --available --format json

The result is a list of objects with meta information about the service:

{ 
  "config": {
    "name": "cloudfunctions.googleapis.com",
    "title": "Cloud Functions API",
    "documentation": {},
    "features": [],
    "monitoredResources": [
    "monitoring": {},
    "quota": {},
    "authentication": {},
    "usage": {}
  },
  "dependencyConfig": {
    "dependsOn": [],
    "directlyDependsOn": []
    "directlyRequiredBy": [],
    "requiredBy": []
  },
  "serviceAccounts": [],
  "state": "DISABLED"
}

I see four main attributes for each service: config, dependencyConfig, serviceAccounts and state. The fields dependencyConfig lists the service dependencies, while serviceAccounts lists the service accounts which are created in the project for this service. Note that these fields are not part of the documented service usage API.

listing specific dependencies

So, this service usage API provides all the dependencies of a specific service. To list all dependent services of Cloud Functions, I use the following command:

gcloud services list \
   --available --format json | \
jq --arg service cloudfunctions.googleapis.com \
    'map(select(.config.name == $service)| 
        { 
          name:      .config.name, 
          dependsOn: .dependencyConfig.dependsOn
        }
    )'

and the result is:

{
  "name": "cloudfunctions.googleapis.com",
  "dependsOn": [
    "cloudfunctions.googleapis.com",
    "logging.googleapis.com",
    "pubsub.googleapis.com",
    "source.googleapis.com",
    "storage-api.googleapis.com",
    "storage-component.googleapis.com"
  ]
}

These are precisely the six services that were previously enabled \o/.

If you want to explore dependencies yourself, you can use this bash script. If you do not want to type, you can browse through the entire graph
google cloud platform services dependencies.

I know it is a bit tiny, but luckily it is a scalable vector graphic. So open it in a separate window and you can pan and zoom.

conclusion

Thanks to some undocumented properties of by the Google service usage API, I can find all dependencies between Google Cloud Platform services.

For a number of Google Cloud platform services I need to perform a Google site verification in order to proof that I actually own a domain. Unfortunately, the Google Terraform provider does not provide support for this. In this blog I will show you how to automate this using a custom terraform provider.

(more…)

The AWS Cloud Development Kit (AWS CDK) makes it easy to build cloud infrastructure. And CDK Pipelines makes it "painless" to deploy cloud infrastructure. So let’s create an AWS CDK CI/CD pipeline, and build and run our application on AWS.

(more…)

Recently I worked on a PHP application that used cron for background processing. Since it took some time to get it right, I’m sharing the solution.

(more…)

Many applications log to files. In a container environment this doesn’t work well. The file system is not persisted and logs are lost. In GKE you can persist log files to Cloud Logging with the Cloud Logging agent. The agent, however, doesn’t specify the required resource type. Causing the logs to appear as non-Kubernetes Container logs. This blog shows how to resolve that.

Cloud Logging Agent in GKE

The Cloud Logging agent is a Fluentd service configured with the Google Cloud output plugin. Events sent to the output plugin must include a Cloud Operations for GKE resource types. Since without any resource type, the event will be registered as a VM instance log record in Cloud Logging.

Set the Kubernetes Container resource type

Kubernetes Container events use resource type k8s_container.[namespace].[pod].[container] and are specified by the attribute logging.googleapis.com/local_resource_id using a Fluentd filter.

<filter **>
  @type record_transformer
  enable_ruby true
  <record>
    "logging.googleapis.com/local_resource_id" ${"k8s_container.#{ENV['K8S_NAMESPACE']}.#{ENV['K8S_POD']}.#{ENV['K8S_CONTAINER']}"}
  </record>
</filter>

The entire configuration file is available on GitHub.

The filter uses Kubernetes’ Downward API to read resource type data from environment variables.

    spec:
      containers:
        - name: application
        - name: logging-agent
          env:
          - name: K8S_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: K8S_POD
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: K8S_CONTAINER
            value: application

Deploy the GKE logging agent

The logging agent uses a sidecar container deployment to access application log files.

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        - name: app
          volumeMounts:
            - name: logs
              mountPath: /app/log
        - name: gke-fluentd
          env:
          - name: K8S_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: K8S_POD
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: K8S_CONTAINER
            value: app # Use the application container name
          volumeMounts:
            - name: logs
              mountPath: /app/log
      volumes:
        - name: logs
          emptyDir: {}

Try it yourself with the example application provided at GitHub.

Discussion

The logging agent allows us to persist logs. However, we don’t actually solve the problem of using log files in a container-environment..

In container-environments you should use cloud-native loggers, or stream all logs to the console. Changing the logger should be a dependency configuration-file update; and changing the logger behavior should be a logger configuration-file update.

Conclusion

The Cloud Logging agent allows you to keep using log files in container environments. Since the default configuration doesn’t specify the appropriate GKE resource types, you will have to maintain the agent. Therefore I recommend to fix your application logging. Allow configuration of a cloud-native logger and use structured logging to opt-in to all major log analysis services.

Image by Free-Photos from Pixabay