Home > Enterprise >  find column value based on another dataframe
find column value based on another dataframe

Time:12-14

I've a sample dataframe

value    id
  a       1
  b       2
  c       5
  d       8
  e       11

another dataframe:

entity     start_range       end_range
 ABC           1                 3
 DEF           4                 7
 XYZ           8                 15

How can I get the values of entities in dataframe1 based on range which would look like the below?

value    id       entity
  a       1        ABC
  b       2        ABC
  c       5        DEF
  d       8        XYZ
  e       11       XYZ

CodePudding user response:

it's not a clean answer and I don't know if there is a better way to do this but try this it should works:

data=pd.DataFrame({"value":["a","b","c","d","e"],"id":[1,2,5,8,11]})
df=pd.DataFrame({"entity":["ABC","DEF","XYZ"],"start_range":[1,4,8],"end_range":[3,7,15]})
df["explode"]=df.apply(lambda x:[i for i in range(x["start_range"],x["end_range"])],axis=1)
exploded=df.explode("explode")
exploded.index=exploded["explode"]
data["entity"]=data["id"].replace(exploded["entity"].to_dict())

CodePudding user response:

You can do:

ii = pd.IntervalIndex.from_arrays(df2['start_range'], df2['end_range'], closed='both')
df1['entity'] = df2.set_index(ii).loc[df1['id'], 'entity'].values

print(df1)

  value  id entity
0     a   1    ABC
1     b   2    ABC
2     c   5    DEF
3     d   8    XYZ
4     e  11    XYZ

So the problem to be solved in this case is how to search for 'id' in the range. Apparently loc can do that ! as long as your index is IntervalIndex. So create IntervalIndex from df2 and use that to loc the df1's id

  • Related