Home > Enterprise >  How can I use a list of data to match data in a dataframe?
How can I use a list of data to match data in a dataframe?

Time:08-30

I got a list of coordinates, and I need to match the coordinates in a dataframe which contains a unique id and index for each of the coordinates. I want to match the coordinates and print the id and index of each coordinates in the list.

e.g.

List_coords = [[1,2],[3,4],[5,6]]
df = 
Index  ID  Coords
1      23  [1,2] 
2      34  [3,4]
3      45  [4,5]
4      56  [5,6] 

I expect to get something like 1-23, 2-34, 4-56 and save them to another list. How can I do this?

CodePudding user response:

Is this you are looking for?

match = df['Coords'].isin(List_coords)
(df.loc[match, 'Index'].astype(str)   '-'   df.loc[match, 'ID'].astype(str)).tolist()

The output is

['1-23', '2-34', '4-56']

CodePudding user response:

IIUC you want to get list from Index, ID columns by concatening them with '-' but only for those rows whose 'Coords' is in List_coords?

Then:

m = df['Coords'].isin(List_coords)
out = df.Index.astype(str).add('-').add(df.ID.astype(str))
out = out[m].tolist()

print(out):

['1-23', '2-34', '4-56']

CodePudding user response:

I think you need,

List_coords = [[1,2],[3,4],[5,6]]

df_matched = df[df['Coords'].isin(List_coords)]

output = df_matched[["Index", "ID"]].astype(str).apply(lambda row: row.str.cat(sep="-"), axis=1).values.tolist()

print(output)
>> ['1-23', '2-34', '4-56']

CodePudding user response:

You could use Pandas 'merge'. This solution is merging two DataFrames together: one with the ids coordinates and another which is made from a list of the coordiantes being looked up.


import pandas as pd

# Create the parent DF 
parent_df = pd.DataFrame([
    [23, [1,2]], 
    [45, [4,5]], 
    [56, [5,6]], 
    [34, [3,4]]
    ], columns=['id', 'coordinates']) 

# Set as string to perform merge
parent_df['coordinates'] = parent_df['coordinates'].astype(str)

# Take a list of input coords, set as a DF
input_coords = [[1,2],[3,4],[5,6],[99,99]]
list_of_list_of_input_coords = [[coord] for coord in input_coords]
input_coords_df = pd.DataFrame(list_of_list_of_input_coords, columns=['coordinates'])
input_coords_df['coordinates'] = input_coords_df['coordinates'].astype(str)

# Merge the DFs together
merged_df = input_coords_df.merge(parent_df, how='left', on=['coordinates'])

final_list = []

# Createa final list of the ID and coordinates
for index, row in merged_df.iterrows():
    final_list.append([row['id'], row['coordinates']])

This would five a final result in a list:

[[23.0, '[1, 2]'], [34.0, '[3, 4]'], [56.0, '[5, 6]'], [nan, '[99, 99]']]
  • Related