Carvago

GATHERING AND CATEGORISATION OF 4.5 MILLION CAR ADS FROM ALL OVER EUROPE EVERY DAY THANKS TO REVOLT BI.

Source: depositphotos.com

The Czech company Carvago operates a very successful online marketplace for used cars from all over Europe. It already offers around 1 million cars to customers from the Czech Republic, Slovakia and Germany. Other European countries are also lined up in the near future.

0
millions of adverts per day
0
different car models
0
car brands
0
created a clear catalogue

Challenge

In order to achieve these goals at Carvago, they need to find used cars offered throughout Europe every day and include them in their offer.

In order for their customers to be able to find the best offer for the car they are looking for at any time, the database has to be constantly updated with newly offered cars on all portals in Europe. At the same time, their dealers often give insufficient data on the car being offered or place the ad on several portals at the same time.

Carvago had to not only obtain new ads, but also eliminate duplicates, correct erroneous information and classify all offers on the basis of an exhaustive catalogue of car models, including key parameters such as engine, transmission type, drivetrain, etc.

Analysis

The main requirements for the solution were:

  • Prepare a catalogue containing models covering 95%+ of the European passenger car market, including key parameters such as engine, transmission type, drivetrain, etc.
  • The catalogue should update itself automatically according to external sources (e.g. mobile.de, cars-data.com)
  • Create a database with the current offer of cars on the European market
  • Deduplicate, accurately assign and classify the collected ads based on the catalogue

We carried out a detailed analysis of the existing sources of information about cars and advertisements and identified:

  • 3,000+ different car models
  • 250+ car brands
  • 85 main car parameters
  • 14 main servers with different data structure
  • 4.5 million ads added or updated every day
  • Overall, no classification system for ads, most often only through texts or photos of the car

Interesting fact: 10 car models cover 37% of the market.

Solution

Revolt BI’s solution for Carvago includes several components:

  • Creation of a data warehouse for the catalogue and ads
  • Data acquisition
  • Data analysis
  • Business analytics

As a data warehouse and DevOps platform, we chose Keboola, with data storage on Snowflake. The main reasons for this decisions were excellent computing power, integration of all necessary services, diagnostics of all processes and the many other benefits of Keboola solution.

For automatic image recognition, we use deep learning – a convolutional neural network (CNN) that, thanks to a set of algorithms and technologies, is able to identify objects and many other types of elements in an image and draw conclusions by analysing them at low cost. Our solution is also able to correct incorrect information – e.g. it recognises from a photo of a car that it is a combi, even if the advertisement states that it is a VAN or MPV. We are even able to automatically recognise the type of air conditioning from the interior photo!

Interesting fact: 2,000 photos are needed for quality training of the neural network for a single model.

We perform business analytics using Tableau, as no other visualization tool would be able to easily handle such widely differing views of many aspects of Carvago’s operation, not only for the company itself, but also for its business partners.

Result

Thanks to its cooperation with Revol BI, Carvago has obtained unique and always up-to-date data on European used cars, including the relevant parameters and price of the car, as well as analytical tools for their commercial use.

The business analytics from Revolt BI allow the Carvago sales department and its customers to make data-based decisions, such as targeting the offer to the vender according to their strong segments or a detailed comparison of the offered cars across advertising servers.

Catalogue
  • 3,000+ models, 250+ brands
  • Complete records of key parameters
  • Automatic control and completion of unknown parameters such as body type, number of doors, engine volume, transmission type, etc.
  • Manual review and change of catalogue items is possible
Data acquisition
  • 4.5 million ads per day
  • 130 advertising servers
  • Deduplication
  • Ensuring of automatic data consistency
  • Possibility of manual inspection and correction
  • Pairing to catalogue items
  • Daily updates and selected data, e.g. auctions, can also be updated in real time
Analytical tools
  • Data extraction diagnostics
  • Tool for the quick detection of errors, suspicious and poor-quality ads
  • A complete overview of the state of the European market through regions, models, vehicle age, price levels and other parameters
  • Identification of attractive offers of cars (comprehensive assessment of model, age, equipment) that can be sold at a profit, e.g. in other regions
  • Tool for correct pricing based on model, age, condition and equipment

Contact us

Are you troubled by data, processes or the entire analytical environment?
We’re here for you.