Data Cleaning and Analysis of Use Car Listing on eBay Kleinanzeigen

We will be working on dataset of eBay Kleinanzeigen of used cars. The dataset was originally scraped and uploaded to Kaggle. We've made a few modifications from the original dataset that was uploaded to Kaggle:

The data dictionary provided with data is as follows:

  • dateCrawled - When this ad was first crawled. All field-values are taken from this date.
  • name - Name of the car.
  • seller - Whether the seller is private or a dealer.
  • offerType - The type of listing
  • price - The price on the ad to sell the car.
  • abtest - Whether the listing is included in an A/B test.
  • vehicleType - The vehicle Type.
  • yearOfRegistration - The year in which which year the car was first registered.
  • gearbox - The transmission type.
  • powerPS - The power of the car in PS.
  • model - The car model name.
  • kilometer - How many kilometers the car has driven.
  • monthOfRegistration - The month in which which year the car was first registered.
  • fuelType - What type of fuel the car uses.
  • brand - The brand of the car.
  • notRepairedDamage - If the car has a damage which is not yet repaired.
  • dateCreated - The date on which the eBay listing was created.
  • nrOfPictures - The number of pictures in the ad.
  • postalCode - The postal code for the location of the vehicle.
  • lastSeenOnline - When the crawler saw this ad last online.

The aim of this project is to clean the data and analyze the included used car listings.

In [2]:
import pandas as pd
import numpy as np
In [3]:
autos = pd.read_csv("autos.csv", encoding="Latin-1")
autos.info()
autos.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 20 columns):
dateCrawled            50000 non-null object
name                   50000 non-null object
seller                 50000 non-null object
offerType              50000 non-null object
price                  50000 non-null object
abtest                 50000 non-null object
vehicleType            44905 non-null object
yearOfRegistration     50000 non-null int64
gearbox                47320 non-null object
powerPS                50000 non-null int64
model                  47242 non-null object
odometer               50000 non-null object
monthOfRegistration    50000 non-null int64
fuelType               45518 non-null object
brand                  50000 non-null object
notRepairedDamage      40171 non-null object
dateCreated            50000 non-null object
nrOfPictures           50000 non-null int64
postalCode             50000 non-null int64
lastSeen               50000 non-null object
dtypes: int64(5), object(15)
memory usage: 7.6+ MB
Out[3]:
dateCrawled name seller offerType price abtest vehicleType yearOfRegistration gearbox powerPS model odometer monthOfRegistration fuelType brand notRepairedDamage dateCreated nrOfPictures postalCode lastSeen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50

Our dataset contains 20 columns, most of which are stored as strings. There are a few columns with null values, but no columns have more than ~20% null values. There are some columns that contain dates stored as strings.

We'll have to clean the column names to make the data easier to work with.

In [4]:
# lets have detailed look at column names
autos.columns
Out[4]:
Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',
       'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',
       'odometer', 'monthOfRegistration', 'fuelType', 'brand',
       'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',
       'lastSeen'],
      dtype='object')
In [5]:
autos.columns = ['date_crawled', 'name', 'seller', 'offer_type', 'price', 'ab_test',
       'vehicle_type', 'registration_year', 'gearbox', 'power_ps', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'num_pictures', 'postal_code',
       'last_seen']

We'll make a few changes here:

  • Change the columns from camelcase to snakecase.
  • Change a few wordings to more accurately describe the columns.
In [6]:
# lets check top 5 rows 
autos.head()
Out[6]:
date_crawled name seller offer_type price ab_test vehicle_type registration_year gearbox power_ps model odometer registration_month fuel_type brand unrepaired_damage ad_created num_pictures postal_code last_seen
0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD privat Angebot $5,000 control bus 2004 manuell 158 andere 150,000km 3 lpg peugeot nein 2016-03-26 00:00:00 0 79588 2016-04-06 06:45:54
1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik privat Angebot $8,500 control limousine 1997 automatik 286 7er 150,000km 6 benzin bmw nein 2016-04-04 00:00:00 0 71034 2016-04-06 14:45:08
2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United privat Angebot $8,990 test limousine 2009 manuell 102 golf 70,000km 7 benzin volkswagen nein 2016-03-26 00:00:00 0 35394 2016-04-06 20:15:37
3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... privat Angebot $4,350 control kleinwagen 2007 automatik 71 fortwo 70,000km 6 benzin smart nein 2016-03-12 00:00:00 0 33729 2016-03-15 03:16:28
4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... privat Angebot $1,350 test kombi 2003 manuell 0 focus 150,000km 7 benzin ford nein 2016-04-01 00:00:00 0 39218 2016-04-01 14:38:50

Time to explore Data

We will have exolore the data and try to narrow it down what can we do to clean data

In [7]:
autos.describe(include='all')
Out[7]:
date_crawled name seller offer_type price ab_test vehicle_type registration_year gearbox power_ps model odometer registration_month fuel_type brand unrepaired_damage ad_created num_pictures postal_code last_seen
count 50000 50000 50000 50000 50000 50000 44905 50000.000000 47320 50000.000000 47242 50000 50000.000000 45518 50000 40171 50000 50000.0 50000.000000 50000
unique 48213 38754 2 2 2357 2 8 NaN 2 NaN 245 13 NaN 7 40 2 76 NaN NaN 39481
top 2016-03-12 16:06:22 Ford_Fiesta privat Angebot $0 test limousine NaN manuell NaN golf 150,000km NaN benzin volkswagen nein 2016-04-03 00:00:00 NaN NaN 2016-04-07 06:17:27
freq 3 78 49999 49999 1421 25756 12859 NaN 36993 NaN 4024 32424 NaN 30107 10687 35232 1946 NaN NaN 8
mean NaN NaN NaN NaN NaN NaN NaN 2005.073280 NaN 116.355920 NaN NaN 5.723360 NaN NaN NaN NaN 0.0 50813.627300 NaN
std NaN NaN NaN NaN NaN NaN NaN 105.712813 NaN 209.216627 NaN NaN 3.711984 NaN NaN NaN NaN 0.0 25779.747957 NaN
min NaN NaN NaN NaN NaN NaN NaN 1000.000000 NaN 0.000000 NaN NaN 0.000000 NaN NaN NaN NaN 0.0 1067.000000 NaN
25% NaN NaN NaN NaN NaN NaN NaN 1999.000000 NaN 70.000000 NaN NaN 3.000000 NaN NaN NaN NaN 0.0 30451.000000 NaN
50% NaN NaN NaN NaN NaN NaN NaN 2003.000000 NaN 105.000000 NaN NaN 6.000000 NaN NaN NaN NaN 0.0 49577.000000 NaN
75% NaN NaN NaN NaN NaN NaN NaN 2008.000000 NaN 150.000000 NaN NaN 9.000000 NaN NaN NaN NaN 0.0 71540.000000 NaN
max NaN NaN NaN NaN NaN NaN NaN 9999.000000 NaN 17700.000000 NaN NaN 12.000000 NaN NaN NaN NaN 0.0 99998.000000 NaN

From above we can see that we find few columns that can be dropped.

  • offer_type
  • seller
  • num_pictures

however before we do that lets examine first to absolutely make sure what's in these columns

In [8]:
autos["seller"].describe()
Out[8]:
count      50000
unique         2
top       privat
freq       49999
Name: seller, dtype: object
In [9]:
autos["offer_type"].describe()
Out[9]:
count       50000
unique          2
top       Angebot
freq        49999
Name: offer_type, dtype: object
In [10]:
autos["num_pictures"].describe()
Out[10]:
count    50000.0
mean         0.0
std          0.0
min          0.0
25%          0.0
50%          0.0
75%          0.0
max          0.0
Name: num_pictures, dtype: float64

Now we are absolutely sure and have data to back up that these columns doesn't add any value to our project and we should drop them from dataset

In [11]:
autos = autos.drop(["num_pictures", "seller", "offer_type"], axis=1)

On the closer look We should also explore following columns as well

  • ab_test
  • gearbox
  • unrepaired_damage
In [12]:
autos["ab_test"].describe()
Out[12]:
count     50000
unique        2
top        test
freq      25756
Name: ab_test, dtype: object
In [13]:
autos["gearbox"].value_counts()
Out[13]:
manuell      36993
automatik    10327
Name: gearbox, dtype: int64
In [14]:
autos["unrepaired_damage"].value_counts()
Out[14]:
nein    35232
ja       4939
Name: unrepaired_damage, dtype: int64

We can conclude we need those columns as they hold key piece of information, so we will keep them

We have also suspect that price and odometer column values are stored as text, so lets confirm if that is the case or not ?

In [15]:
autos["price"].describe()
Out[15]:
count     50000
unique     2357
top          $0
freq       1421
Name: price, dtype: object
In [16]:
autos["odometer"].describe()
Out[16]:
count         50000
unique           13
top       150,000km
freq          32424
Name: odometer, dtype: object

So, our suspision is spot on. Both column values are stored as text.

We will now remove any non-numeric characters from it and convert column to a numeric type While doing so we will also rename odometer column to odometer_km

In [17]:
autos["price"] = (autos["price"]
                          .str.replace("$","")
                          .str.replace(",","")
                          .astype(int)
                          )
autos["price"].head()
Out[17]:
0    5000
1    8500
2    8990
3    4350
4    1350
Name: price, dtype: int64
In [18]:
autos["odometer"] = (autos["odometer"]
                                .str.replace("km","") 
                                .str.replace(",","")
                                .astype(int) 
                                )
autos.rename({"odometer":"odometer_km"}, axis=1 , inplace=True)
autos["odometer_km"].head()
Out[18]:
0    150000
1    150000
2     70000
3     70000
4    150000
Name: odometer_km, dtype: int64

Explore further and Data Cleaning

In [19]:
autos["odometer_km"].value_counts()
Out[19]:
150000    32424
125000     5170
100000     2169
90000      1757
80000      1436
70000      1230
60000      1164
50000      1027
5000        967
40000       819
30000       789
20000       784
10000       264
Name: odometer_km, dtype: int64

We can certainly assume that all fields are rounded and most of the cars had high odometer reading

In [20]:
print(autos["odometer_km"].describe())
count     50000.000000
mean     125732.700000
std       40042.211706
min        5000.000000
25%      125000.000000
50%      150000.000000
75%      150000.000000
max      150000.000000
Name: odometer_km, dtype: float64
In [21]:
print(autos["odometer_km"].unique().shape)
(13,)

As we can see there are 13 unique value for odometer_km

In [22]:
print(autos["price"].describe())
print(autos["price"].value_counts())
print(autos["price"].unique().shape)
count    5.000000e+04
mean     9.840044e+03
std      4.811044e+05
min      0.000000e+00
25%      1.100000e+03
50%      2.950000e+03
75%      7.200000e+03
max      1.000000e+08
Name: price, dtype: float64
0         1421
500        781
1500       734
2500       643
1000       639
1200       639
600        531
800        498
3500       498
2000       460
999        434
750        433
900        420
650        419
850        410
700        395
4500       394
300        384
2200       382
950        379
1100       376
1300       371
3000       365
550        356
1800       355
5500       340
1250       335
350        335
1600       327
1999       322
          ... 
46200        1
29600        1
13480        1
21700        1
7373         1
3279         1
4286         1
188          1
17830        1
9130         1
910          1
238          1
2671         1
69900        1
151990       1
2479         1
4510         1
86500        1
47499        1
16998        1
27299        1
41850        1
4780         1
686          1
6495         1
20790        1
8970         1
846          1
2895         1
33980        1
Name: price, Length: 2357, dtype: int64
(2357,)

Wow there is lot going on here. Here is our observation

  • there are 2357 unique prices of the cars
  • looks like website only allowed rounded value
  • there are 1421 cars given away for free which is less than 2% of dataset
  • maximum car price is 10 times more than most expesive car exists in the world. A staggering 100 Million and most expensive car in world costs around 12.5 Million. (go figure :) )

Lets have closer look at maximum prices

In [23]:
autos["price"].value_counts().sort_index(ascending=False).head(20)
Out[23]:
99999999    1
27322222    1
12345678    3
11111111    2
10000000    1
3890000     1
1300000     1
1234566     1
999999      2
999990      1
350000      1
345000      1
299000      1
295000      1
265000      1
259000      1
250000      1
220000      1
198000      1
197000      1
Name: price, dtype: int64
In [24]:
autos["price"].value_counts().sort_index(ascending=True).head(20)
Out[24]:
0     1421
1      156
2        3
3        1
5        2
8        1
9        1
10       7
11       2
12       3
13       2
14       1
15       2
17       3
18       1
20       4
25       5
29       1
30       7
35       1
Name: price, dtype: int64
In [25]:
autos["price"].value_counts().between(0,3260100)
Out[25]:
0         True
500       True
1500      True
2500      True
1000      True
1200      True
600       True
800       True
3500      True
2000      True
999       True
750       True
900       True
650       True
850       True
700       True
4500      True
300       True
2200      True
950       True
1100      True
1300      True
3000      True
550       True
1800      True
5500      True
1250      True
350       True
1600      True
1999      True
          ... 
46200     True
29600     True
13480     True
21700     True
7373      True
3279      True
4286      True
188       True
17830     True
9130      True
910       True
238       True
2671      True
69900     True
151990    True
2479      True
4510      True
86500     True
47499     True
16998     True
27299     True
41850     True
4780      True
686       True
6495      True
20790     True
8970      True
846       True
2895      True
33980     True
Name: price, Length: 2357, dtype: bool

So we will keep our range from $1 to $3,260,100 as it was most expesive used car ever sold on ebay (ref https://www.ebay.com/motors/blog/most-expensive-cars-sold-ebay/)

In [26]:
autos = autos[autos["price"].between(1,3260100)]
autos["price"].describe()
Out[26]:
count    4.857000e+04
mean     6.002279e+03
std      1.444526e+04
min      1.000000e+00
25%      1.200000e+03
50%      3.000000e+03
75%      7.490000e+03
max      1.300000e+06
Name: price, dtype: float64

Date Exploration

There are total 5 columns that should represent date value. Some of them created by crawler and some are from website.

  • date_crawled
  • registration_month
  • registration_year
  • ad_created
  • last_seen

Right now, the date_crawled, last_seen, and ad_created columns are all identified as string values by pandas.

lets explore them and see what we discover

In [27]:
autos[['date_crawled','ad_created','last_seen']][0:5]
Out[27]:
date_crawled ad_created last_seen
0 2016-03-26 17:47:46 2016-03-26 00:00:00 2016-04-06 06:45:54
1 2016-04-04 13:38:56 2016-04-04 00:00:00 2016-04-06 14:45:08
2 2016-03-26 18:57:24 2016-03-26 00:00:00 2016-04-06 20:15:37
3 2016-03-12 16:58:10 2016-03-12 00:00:00 2016-03-15 03:16:28
4 2016-04-01 14:38:50 2016-04-01 00:00:00 2016-04-01 14:38:50
In [28]:
(autos["date_crawled"]
        .str[:10]
        .value_counts(normalize=True, dropna=False)
        .sort_index()
        )
Out[28]:
2016-03-05    0.025324
2016-03-06    0.014042
2016-03-07    0.036010
2016-03-08    0.033292
2016-03-09    0.033086
2016-03-10    0.032180
2016-03-11    0.032572
2016-03-12    0.036916
2016-03-13    0.015668
2016-03-14    0.036545
2016-03-15    0.034280
2016-03-16    0.029607
2016-03-17    0.031645
2016-03-18    0.012909
2016-03-19    0.034775
2016-03-20    0.037883
2016-03-21    0.037389
2016-03-22    0.033004
2016-03-23    0.032222
2016-03-24    0.029339
2016-03-25    0.031604
2016-03-26    0.032201
2016-03-27    0.031089
2016-03-28    0.034857
2016-03-29    0.034116
2016-03-30    0.033683
2016-03-31    0.031830
2016-04-01    0.033683
2016-04-02    0.035475
2016-04-03    0.038604
2016-04-04    0.036504
2016-04-05    0.013095
2016-04-06    0.003171
2016-04-07    0.001400
Name: date_crawled, dtype: float64
In [29]:
(autos["date_crawled"]
        .str[:10]
        .value_counts(normalize=True, dropna=False)
        .sort_values()
        )
Out[29]:
2016-04-07    0.001400
2016-04-06    0.003171
2016-03-18    0.012909
2016-04-05    0.013095
2016-03-06    0.014042
2016-03-13    0.015668
2016-03-05    0.025324
2016-03-24    0.029339
2016-03-16    0.029607
2016-03-27    0.031089
2016-03-25    0.031604
2016-03-17    0.031645
2016-03-31    0.031830
2016-03-10    0.032180
2016-03-26    0.032201
2016-03-23    0.032222
2016-03-11    0.032572
2016-03-22    0.033004
2016-03-09    0.033086
2016-03-08    0.033292
2016-04-01    0.033683
2016-03-30    0.033683
2016-03-29    0.034116
2016-03-15    0.034280
2016-03-19    0.034775
2016-03-28    0.034857
2016-04-02    0.035475
2016-03-07    0.036010
2016-04-04    0.036504
2016-03-14    0.036545
2016-03-12    0.036916
2016-03-21    0.037389
2016-03-20    0.037883
2016-04-03    0.038604
Name: date_crawled, dtype: float64

We can see that site has been crawed for month of March and April and about roughly same time.

In [34]:
(autos["last_seen"]
        .str[:10]
        .value_counts(normalize=True, dropna=False)
        .sort_index()
        )
Out[34]:
2016-03-05    0.001071
2016-03-06    0.004324
2016-03-07    0.005394
2016-03-08    0.007412
2016-03-09    0.009594
2016-03-10    0.010665
2016-03-11    0.012374
2016-03-12    0.023780
2016-03-13    0.008894
2016-03-14    0.012600
2016-03-15    0.015874
2016-03-16    0.016450
2016-03-17    0.028083
2016-03-18    0.007350
2016-03-19    0.015833
2016-03-20    0.020651
2016-03-21    0.020630
2016-03-22    0.021371
2016-03-23    0.018530
2016-03-24    0.019765
2016-03-25    0.019209
2016-03-26    0.016800
2016-03-27    0.015648
2016-03-28    0.020877
2016-03-29    0.022359
2016-03-30    0.024768
2016-03-31    0.023780
2016-04-01    0.022792
2016-04-02    0.024933
2016-04-03    0.025201
2016-04-04    0.024480
2016-04-05    0.124768
2016-04-06    0.221804
2016-04-07    0.131933
Name: last_seen, dtype: float64
In [35]:
autos["registration_year"].describe()
Out[35]:
count    48570.000000
mean      2004.754231
std         88.639460
min       1000.000000
25%       1999.000000
50%       2004.000000
75%       2008.000000
max       9999.000000
Name: registration_year, dtype: float64

Well, obviously we can see that two values are not possible as min = 1000 and max = 9999. Car was not invented in year 1000 and we are in 2019 so year 9999 doesn't make any sense.

In [46]:
autos = autos[autos["registration_year"].between(1990,2016)]
autos["registration_year"].value_counts(normalize=True)
Out[46]:
2000    0.069535
2005    0.064688
1999    0.063851
2004    0.059554
2003    0.059466
2006    0.058827
2001    0.058078
2002    0.054773
1998    0.052063
2007    0.050169
2008    0.048803
2009    0.045960
1997    0.042986
2011    0.035759
2010    0.035010
1996    0.030251
2012    0.028863
1995    0.027034
2016    0.026880
2013    0.017692
2014    0.014608
1994    0.013859
1993    0.009364
2015    0.008637
1992    0.008174
1990    0.007645
1991    0.007469
Name: registration_year, dtype: float64

Explore the Brands

In [50]:
autos["brand"].value_counts(normalize=True)
Out[50]:
volkswagen        0.210897
bmw               0.111199
opel              0.108798
mercedes_benz     0.093705
audi              0.087977
ford              0.069998
renault           0.048142
peugeot           0.030670
fiat              0.025646
seat              0.018772
skoda             0.016767
mazda             0.015511
nissan            0.015511
smart             0.014564
citroen           0.014035
toyota            0.012845
hyundai           0.010311
volvo             0.008945
mini              0.008923
mitsubishi        0.008417
honda             0.007844
kia               0.007271
sonstige_autos    0.007073
alfa_romeo        0.006389
suzuki            0.006059
porsche           0.005486
chevrolet         0.005156
chrysler          0.003503
dacia             0.002710
daihatsu          0.002556
jeep              0.002247
subaru            0.002115
land_rover        0.002071
saab              0.001608
daewoo            0.001542
jaguar            0.001542
rover             0.001366
lancia            0.001058
lada              0.000485
trabant           0.000286
Name: brand, dtype: float64

We can see that almost 50% of car sold was produced by German car makers

In [51]:
brand_counts = autos["brand"].value_counts(normalize=True)
common_brands = brand_counts[brand_counts > .05].index
print(common_brands)
Index(['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford'], dtype='object')
In [52]:
brand_mean_prices = {}

for brand in common_brands:
    brand_only = autos[autos["brand"] == brand]
    mean_price = brand_only["price"].mean()
    brand_mean_prices[brand] = int(mean_price)

brand_mean_prices
Out[52]:
{'audi': 9420,
 'bmw': 8604,
 'ford': 3761,
 'mercedes_benz': 8582,
 'opel': 2953,
 'volkswagen': 5398}

so we can see why Volkswagen is very popular because it is falls in mid range car price. We can see Audi, BMW, and Mercedes Benz is high value cars as open and ford is at the lower end of price range.

What about Mileage ?

In [53]:
bmp_series = pd.Series(brand_mean_prices)
pd.DataFrame(bmp_series, columns=["mean_price"])
Out[53]:
mean_price
audi 9420
bmw 8604
ford 3761
mercedes_benz 8582
opel 2953
volkswagen 5398
In [54]:
brand_mean_mileage = {}

for brand in common_brands:
    brand_only = autos[autos["brand"] == brand]
    mean_mileage = brand_only["odometer_km"].mean()
    brand_mean_mileage[brand] = int(mean_mileage)

mean_mileage = pd.Series(brand_mean_mileage).sort_values(ascending=False)
mean_prices = pd.Series(brand_mean_prices).sort_values(ascending=False)

brand_info = pd.DataFrame(mean_mileage,columns=['mean_mileage'])
brand_info
Out[54]:
mean_mileage
bmw 132668
mercedes_benz 130848
opel 129864
volkswagen 129101
audi 128974
ford 125281
In [55]:
brand_info["mean_price"] = mean_prices
brand_info
Out[55]:
mean_mileage mean_price
bmw 132668 8604
mercedes_benz 130848 8582
opel 129864 2953
volkswagen 129101 5398
audi 128974 9420
ford 125281 3761