Home > Enterprise >  Finding and assigning a right value between data frames in Python
Finding and assigning a right value between data frames in Python

Time:01-08

In general, I have a catalog and I have specific data. I need to assign the correct value from the catalog to the data.

I have table of cities and another table with users data. I need to compare the city ID and match its name. If I loop through the rows, the intersection doesn't work as I would like...

import pandas as pd

data = [["LN", "London"], ["NY", "New York"], ["LE", "Leeds"]]
cities = pd.DataFrame(data, columns=['ID','City'])
 
data = [['Tom', 10, 'LN'], ['Nick', 15, 'XX'], ['Juli', 14, 'YY']]
df = pd.DataFrame(data, columns=['Name', 'Age', 'Adress'])


result = []

for row in df:
    result = set(df['Adress']).intersection(cities['ID'].tolist())

Expected output:

   Name  Age Adress     Result
0   Tom   10     LN     London
1  Nick   15     XX  no record
2  Juli   14     YY  no record

UPDATE 1:

We have several cities for each user:

data = [['Tom', 10, 'LN, LE, XX'], ['Nick', 15, 'XX, LE'], ['Juli', 14, 'YY, LN']]
df = pd.DataFrame(data, columns=['Name', 'Age', 'Adress'])

Output:

   Name  Age      Adress                    Result
0   Tom   10  LN, LE, XX  London, Leeds, no record
1  Nick   15      XX, LE          no record, Leeds
2  Juli   14      YY, LN         no record, London

CodePudding user response:

Create a mapping dict from cities dataframe then use map. It's like a left outer join.

df['Result'] = df['Adress'].map(cities.set_index('ID')['City']).fillna('no record')
print(df)

# Output
   Name  Age Adress     Result
0   Tom   10     LN     London
1  Nick   15     XX  no record
2  Juli   14     YY  no record

So you can also use merge:

out = (df.merge(cities, left_on='Adress', right_on='ID', how='left')
         .drop(columns='ID').fillna('no record'))
print(out)

# Output
   Name  Age Adress       City
0   Tom   10     LN     London
1  Nick   15     XX  no record
2  Juli   14     YY  no record

Update 1: with multiple addresses:

df['Result'] = (df['Adress'].str.split(', ').explode()
                            .map(cities.set_index('ID')['City']).fillna('no record')
                            .groupby(level=0).apply(', '.join))
print(df)

# Output
   Name  Age      Adress                    Result
0   Tom   10  LN, LE, XX  London, Leeds, no record
1  Nick   15      XX, LE          no record, Leeds
2  Juli   14      YY, LN         no record, London

CodePudding user response:

You can convert the cities DataFrame to a dictionary, and use it to get values:

dct = dict(zip(cities.ID, cities.City))
df["Result"] = df["Adress"].apply(lambda x: dct.get(x, "no record"))
print(df)
   Name  Age Adress     Result
0   Tom   10     LN     London
1  Nick   15     XX  no record
2  Juli   14     YY  no record
  • Related