In general, I have a catalog and I have specific data. I need to assign the correct value from the catalog to the data.
I have table of cities and another table with users data. I need to compare the city ID and match its name. If I loop through the rows, the intersection doesn't work as I would like...
import pandas as pd
data = [["LN", "London"], ["NY", "New York"], ["LE", "Leeds"]]
cities = pd.DataFrame(data, columns=['ID','City'])
data = [['Tom', 10, 'LN'], ['Nick', 15, 'XX'], ['Juli', 14, 'YY']]
df = pd.DataFrame(data, columns=['Name', 'Age', 'Adress'])
result = []
for row in df:
result = set(df['Adress']).intersection(cities['ID'].tolist())
Expected output:
Name Age Adress Result
0 Tom 10 LN London
1 Nick 15 XX no record
2 Juli 14 YY no record
UPDATE 1:
We have several cities for each user:
data = [['Tom', 10, 'LN, LE, XX'], ['Nick', 15, 'XX, LE'], ['Juli', 14, 'YY, LN']]
df = pd.DataFrame(data, columns=['Name', 'Age', 'Adress'])
Output:
Name Age Adress Result
0 Tom 10 LN, LE, XX London, Leeds, no record
1 Nick 15 XX, LE no record, Leeds
2 Juli 14 YY, LN no record, London
CodePudding user response:
Create a mapping dict from cities
dataframe then use map
. It's like a left outer join.
df['Result'] = df['Adress'].map(cities.set_index('ID')['City']).fillna('no record')
print(df)
# Output
Name Age Adress Result
0 Tom 10 LN London
1 Nick 15 XX no record
2 Juli 14 YY no record
So you can also use merge
:
out = (df.merge(cities, left_on='Adress', right_on='ID', how='left')
.drop(columns='ID').fillna('no record'))
print(out)
# Output
Name Age Adress City
0 Tom 10 LN London
1 Nick 15 XX no record
2 Juli 14 YY no record
Update 1: with multiple addresses:
df['Result'] = (df['Adress'].str.split(', ').explode()
.map(cities.set_index('ID')['City']).fillna('no record')
.groupby(level=0).apply(', '.join))
print(df)
# Output
Name Age Adress Result
0 Tom 10 LN, LE, XX London, Leeds, no record
1 Nick 15 XX, LE no record, Leeds
2 Juli 14 YY, LN no record, London
CodePudding user response:
You can convert the cities
DataFrame to a dictionary, and use it to get values:
dct = dict(zip(cities.ID, cities.City))
df["Result"] = df["Adress"].apply(lambda x: dct.get(x, "no record"))
print(df)
Name Age Adress Result
0 Tom 10 LN London
1 Nick 15 XX no record
2 Juli 14 YY no record