Home > other >  panda dataframe search string in another datframe
panda dataframe search string in another datframe

Time:09-29

I am trying to figure out how to do this but not going any where.

import sqlite3
with sqlite3.connect("/content/drive/MyDrive/simple_db.sqlite") as con:
    tmp_df = pd.read_sql_query("SELECT * from simple_zipcode", con)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 42724 entries, 0 to 42723
Data columns (total 24 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   zipcode                  42724 non-null  object 
 1   zipcode_type             42724 non-null  object 
 2   major_city               42724 non-null  object 
 3   post_office_city         33104 non-null  object 
 4   common_city_list         41877 non-null  object 
 5   county                   42724 non-null  object 
 6   state                    42724 non-null  object 
 7   lat                      42724 non-null  float64
 8   lng                      42724 non-null  float64
 9   timezone                 42724 non-null  object 
 10  radius_in_miles          33104 non-null  float64
 11  area_code_list           42724 non-null  object 
 12  population               31448 non-null  float64
..

def zco(x):
    print(x)
    y=tmp_df[tmp_df.zipcode==x]["county"].item()
    print("hello",y)
    if y is None or y is "":
     county="NotFound"
    else:
     county = y ## or major_cit
    return county

data['County'] = data['ZIPCode'].apply(zco)

I want search "data['ZIPCode']" in tmp_df and get the "county" and put that in new column in "data["County"]

CodePudding user response:

Don't have time to work it all out but maybe start with this idea:

data.merge(tmp_df[['zipcode','county']], how='left', left_on='ZIPCode', right_on='zipcode')

Then do a fillna to change the non-matching entries to something like 'not found'.

CodePudding user response:

It's keep adding the new col every time i re-run this section of code. i am using google collab.

data=data.merge(tmp_df[['zipcode','county']], how='left', left_on='ZIPCode', right_on='zipcode')
data = data.drop(["zipcode"], axis=1)
data.info()
  • Related