Home > Net >  Merging two dataframes - conditional rows
Merging two dataframes - conditional rows

Time:04-19

I'm data wrangling at the moment and I've a csv file, from kaggle in Indian Food Prices which I've converted into a dataframe of which one of the columns 'Market' has some categorical location data. For an ML problem I need latitude and longitude data which I've managed to get via the open streetmap api - however as the original dataframe had 131000 rows I created a separate dataframe with only single instances of the location and this new dataframe has 165 rows.

I now need to merge the two dataframes, but figure out a loop that'll populate all the rows from the original dataframe with 131000 rows with the lat and long data from the smaller dataframe of 165 rows, but have the lat and long match the location in the 'Market' column.

Any advice would be gratefully received,

Here's an attempt at trying to achieve the above

def using_where(ndf):
ndf['Lat-Long'] = np.where(df['Market']='Delhi', '28.6517178, 77.2219388'

Here's the head of my large data frame 'ndf'

<bound method NDFrame.head of         Unnamed: 0        Date     Market               Category  

\
0                1  1994-01-15      Delhi     cereals and tubers   
1                2  1994-01-15      Delhi     cereals and tubers   
2                3  1994-01-15      Delhi     miscellaneous food   
3                4  1994-01-15      Delhi           oil and fats   
4                5  1994-01-15  Ahmedabad     cereals and tubers   
...            ...         ...        ...                    ...   
139529      139530  2021-09-15  Kharagpur        pulses and nuts   
139530      139531  2021-09-15  Kharagpur        pulses and nuts   
139531      139532  2021-09-15  Kharagpur        pulses and nuts   
139532      139533  2021-09-15  Kharagpur  vegetables and fruits   
139533      139534  2021-09-15  Kharagpur  vegetables and fruits   

              Commodity Unit PriceFlag PriceType Currency  Price  USD_Price  
0                  Rice   KG    actual    Retail      INR    8.0     0.2545  
1                 Wheat   KG    actual    Retail      INR    5.0     0.1590  
2                 Sugar   KG    actual    Retail      INR   13.5     0.4294  
3         Oil (mustard)   KG    actual    Retail      INR   31.0     0.9860  
4                  Rice   KG    actual    Retail      INR    6.8     0.2163  
...                 ...  ...       ...       ...      ...    ...        ...  
139529  Lentils (masur)   KG    actual    Retail      INR  110.0     1.4972  
139530  Lentils (moong)   KG    actual    Retail      INR  120.0     1.6333  
139531   Lentils (urad)   KG    actual    Retail      INR  115.0     1.5653  
139532           Onions   KG    actual    Retail      INR   30.0     0.4083  
139533         Tomatoes   KG    actual    Retail      INR   40.0     0.5444  

Here's the head of my small dataframe 'df'

    <bound method NDFrame.head of             Market                         geocoded
0            Delhi         (28.6517178, 77.2219388)
4        Ahmedabad         (23.0216238, 72.5797068)
8           Shimla         (31.1041526, 77.1709729)
11       Bengaluru          (12.9767936, 77.590082)
14          Bhopal          (23.2584857, 77.401989)
...            ...                              ...
136823   Dantewada  (18.8640648, 81.38339468738648)
136970     Selamba                               -1
137053      Bodeli         (22.2748105, 73.7166363)
137326     Dhanbad         (23.7952809, 86.4309638)
137389  Jamshedpur         (22.8015194, 86.2029579)

[165 rows x 2 columns]>

CodePudding user response:

I think you can just use merge(), unless I'm missing something:

ndf = pd.merge(ndf, df, how='inner', on='Market')

Here's a full code example with test case:

import pandas as pd
ndf = pd.DataFrame({'Date':['1994-01-15']*5   ['2021-09-15']*5, 'Market':'Delhi,Delhi,Delhi,Delhi,Ahmedabad,Kharagpur,Kharagpur,Kharagpur,Kharagpur,Kharagpur'.split(','), 
    'Category':'cereals and tubers,cereals and tubers,miscellaneous food,oil and fats,cereals and tubers,pulses and nuts,pulses and nuts,pulses and nuts,vegetables and fruits,vegetables and fruits'.split(',')})

df = pd.DataFrame({'Market':'Delhi,Ahmedabad,Shimla,Bengaluru,Bhopal,Kharagpur'.split(','), 
    'geocoded':[(28.6517178, 77.2219388),(23.0216238, 72.5797068),(31.1041526, 77.1709729),(12.9767936, 77.590082),(23.2584857, 77.401989),(22.22, 73.73)]})

ndf = pd.merge(ndf, df, how='inner', on='Market')
print(ndf)

Output:

         Date     Market               Category                  geocoded
0  1994-01-15      Delhi     cereals and tubers  (28.6517178, 77.2219388)
1  1994-01-15      Delhi     cereals and tubers  (28.6517178, 77.2219388)
2  1994-01-15      Delhi     miscellaneous food  (28.6517178, 77.2219388)
3  1994-01-15      Delhi           oil and fats  (28.6517178, 77.2219388)
4  1994-01-15  Ahmedabad     cereals and tubers  (23.0216238, 72.5797068)
5  2021-09-15  Kharagpur        pulses and nuts            (22.22, 73.73)
6  2021-09-15  Kharagpur        pulses and nuts            (22.22, 73.73)
7  2021-09-15  Kharagpur        pulses and nuts            (22.22, 73.73)
8  2021-09-15  Kharagpur  vegetables and fruits            (22.22, 73.73)
9  2021-09-15  Kharagpur  vegetables and fruits            (22.22, 73.73)
  • Related