How Adevinta revolutionized its data solution with a cloud migration
Many organizations are struggling to migrate to the cloud. It is often a complex and invasive project. Adevinta decided to switch from their well-functioning on-premise environment to a cloud environment. This proved that such a switch is almost always worth it, especially for companies that have a lot of data traffic.
For applications that need to be available at all times, any migration is a major challenge. It often concerns a complex and specific project. Environments that have been developed and changed over the years. Backends, APIs, databases, you name it, it’s all been added or updated. Everything needs to be executed perfectly, and any mistake can jeopardize accessibility. The risk and investment can be major obstacles.
But migrating from on-premises to the cloud is actually very beneficial on many fronts, even if it’s not always immediately apparent. Even when the on-premises environment is largely adequate. Suddenly it becomes possible to add things that were previously too expensive and complex. It also provides more flexibility, and allows data to be accessed in new ways. Benefits that only really become apparent after a migration, but are crucial for any forward-looking organization.
Adevinta, the parent company that includes Marktplaats.nl and eBay, is a good example. The company’s on-premises Hadoop environment Odin 2 processes 13,000 data events per second, which amounts to more than 1 billion per day. That amounts to 2.5TB of additional data in Google BigQuery every day. These include posts and search results for users across multiple platforms and brands. In the backend, the data is sent from the producer to Apache Kafka, and goes through an Apache Flink pipeline controlled by Kafka to the consumer. And that worked pretty well, according to Charlie Evans, Senior Software Engineer at Adevinta, at Club Cloud 2021 last November. “The pipeline was in good shape,” says Evans. The final product was easy to maintain, and extremely stable.
A good environment, but not future-proof
If it works, why change it? It’s not that simple. One of the big challenges with on-premises environments is that they require a lot of maintenance, while keeping them up-to-date and adding new features takes a lot of work. Replacing hardware is a big investment that also comes at once to the bill. And Adevinta had plenty of reasons to upgrade.
“The data consumption, the data from HDFS and Kafka, showed a lot of challenges,” Evans explains. “On the Kafka side, it really only facilitated the central data teams of the various organizations to leverage the data. On the HDFS side, we were also using a somewhat dated version of Hadoop, which prevented us from deploying the latest tools and APIs.”
The Odin 2 solution was stuck in its on-premises environment. Adevinta had many challenges when it came to improvements and implementing expansions. Therefore, the decision was made to migrate the entire environment to Google Cloud Platform. This gives Adevinta more flexibility, and access to the tools the cloud platform can provide. It also makes it easier to provide access to the data.
3 Steps for a cloud migration
Step 1: Provide an appropriate architecture
One of the first opportunities a migration offers is that it provides an opportunity to take a hard look at the entire architecture. Now, there was the opportunity to make the architecture more unified. Evans says producers could publish to the proxy, but also publish directly to Kafka.
“If you have those two endpoint interfaces in the pipeline you get friction, because then you have two ways to authenticate. That also provides an opportunity to simplify the Apache Flink pipeline. “
Such a simplification is critical when migrating to the cloud because the cost model is different. Because usage is paid for, operations that are duplicated can create additional costs, explains Diederik Greveling, CTO at GoDataDriven Solutions. A CAPEX model is exchanged for an OPEX model, requiring the organization to be aware of actual usage. “The Flink pipeline included bot detection, for example,” he says.
“For on-premises, bot detection in Flink makes a lot of sense. But we moved that to the BigQuery writing logic. Why? Because then we only need to read from the subscription once, and that is obviously cheaper.
You have to think about cost when migrating to the cloud.” For the scale at which Adevinta operates, this quickly saves tens of thousands of euros.
Step 2: Implement and optimize the cloud infrastructure
The data was mirrored from on-prem to test the cloud implementation with the full data volume. However, the cloud implementation in DataFlow turned out to be suboptimal, in part because it was doing two things at once: it was writing the data from Pub/Sub to BigQuery, but it was also performing bot detection. “It affected other parts of the pipeline,” Evans says. “Bot detection is important, but is more something that is more Best Effort.
But the primary goal is to get the data into BigQuery. That should not be affected by something of lesser importance. Splitting the two led to a big improvement.”
As the architecture is improved, a migration may take longer than previously estimated. But in most cases, it is well worth it.
Step 3: look for the small benefits
It’s all similar to developing for on-premises, according to Greveling. But once it works, he notes, a lot of time is actually saved because the cloud environment is partially or even completely managed. In addition, there are all kinds of small benefits that can add up to make a big difference. “The Odin collector, for example, where tenants send their messages, appears to work well on Cloud Run,” Greveling mentions. “And I should add that it has become even more advantageous. Recently, it is no longer necessary to pay per request, but per CPU allocation. Because of that, those costs were reduced to one-tenth.”
Expanding or even duplicating the environment from that point on is also very easy, Greveling explains. “We had a tenant in Australia, while our infrastructure is in Western Europe,” he says. “The request came in whether we could also make something available in Australia, for example, to reduce latency and transfer costs. It took us maybe a couple of hours. Once you get there, you can scale much easier.”