I am having a difficult time getting the correct data from a reference csv file to the one I am working on.
I have a csv file that has over 6 million rows and 19 columns. I looks something like this : enter image description here
For each row there is a brand and a model of a car amongst other information. I want to add to this file the fuel consumption per 100km traveled and the type of fuel that is used.
I have another csv file that has the fuel consumption of every model of car that looks something like this : enter image description here
What I want to ultimately do is add the matching values of G,H, I and J columns from the second file to the first one.
Because of the size of the file I was wondering if there is another way to do it other than with a "for" or a "while" loop?
EDIT :
For example... The first df would look something like this
ID | Brand | Model | Other_columns | Fuel_consu_1 | Fuel_consu_2 |
---|---|---|---|---|---|
1 | Toyota | Rav4 | a | NaN | NaN |
2 | Honda | Civic | b | NaN | NaN |
3 | GMC | Sierra | c | NaN | NaN |
4 | Toyota | Rav4 | d | NaN | NaN |
The second df would be something like this
ID | Brand | Model | Fuel_consu_1 | Fuel_consu_2 |
---|---|---|---|---|
1 | Toyota | Corrola | 100 | 120 |
2 | Toyota | Rav4 | 80 | 84 |
3 | GMC | Sierra | 91 | 105 |
4 | Honda | Civic | 112 | 125 |
The output should be :
ID | Brand | Model | Other_columns | Fuel_consu_1 | Fuel_consu_2 |
---|---|---|---|---|---|
1 | Toyota | Rav4 | a | 80 | 84 |
2 | Honda | Civic | b | 112 | 125 |
3 | GMC | Sierra | c | 91 | 105 |
4 | Toyota | Rav4 | d | 80 | 84 |
The first df may have many times the same brand and model for different ID's. The order is completely random.
CodePudding user response:
Thank you for providing updates I was able to put something together that should be able to help you
#You drop these two columns because you won't need them once you join them to df1 (which is your 2nd table provided)
df.drop(['Fuel_consu_1', 'Fuel_consu_2'], axis = 1 , inplace = True)
#This will join your first and second column to each other on the Brand and Model columns
df_merge = pd.merge(df, df1, on=['Brand', 'Model'])