I've a sample dataframe
value id
a 1
b 2
c 5
d 8
e 11
another dataframe:
entity start_range end_range
ABC 1 3
DEF 4 7
XYZ 8 15
How can I get the values of entities in dataframe1 based on range which would look like the below?
value id entity
a 1 ABC
b 2 ABC
c 5 DEF
d 8 XYZ
e 11 XYZ
CodePudding user response:
it's not a clean answer and I don't know if there is a better way to do this but try this it should works:
data=pd.DataFrame({"value":["a","b","c","d","e"],"id":[1,2,5,8,11]})
df=pd.DataFrame({"entity":["ABC","DEF","XYZ"],"start_range":[1,4,8],"end_range":[3,7,15]})
df["explode"]=df.apply(lambda x:[i for i in range(x["start_range"],x["end_range"])],axis=1)
exploded=df.explode("explode")
exploded.index=exploded["explode"]
data["entity"]=data["id"].replace(exploded["entity"].to_dict())
CodePudding user response:
You can do:
ii = pd.IntervalIndex.from_arrays(df2['start_range'], df2['end_range'], closed='both')
df1['entity'] = df2.set_index(ii).loc[df1['id'], 'entity'].values
print(df1)
value id entity
0 a 1 ABC
1 b 2 ABC
2 c 5 DEF
3 d 8 XYZ
4 e 11 XYZ
So the problem to be solved in this case is how to search for 'id' in the range. Apparently loc
can do that ! as long as your index is IntervalIndex
.
So create IntervalIndex from df2 and use that to loc
the df1's id