Home > other >  Lookup value in a range in order to merge or join with
Lookup value in a range in order to merge or join with

Time:09-29

I read this post, and this one.

I was unable to make use of the examples there. I think perhaps because I have NaN values at either end. Here's what my dataframe looks like:

DF1 = pd.DataFrame([{'Conversion Value': 0, 'start': 0.0, 'end': np.nan},
 {'Conversion Value': 1, 'start': 0.0, 'end': 1.7},
 {'Conversion Value': 2, 'start': 1.7, 'end': 2.28},
 {'Conversion Value': 3, 'start': 2.28, 'end': 3.12},
 {'Conversion Value': 4, 'start': 3.12, 'end': 4.84},
 {'Conversion Value': 5, 'start': 4.84, 'end': 5.14},
 {'Conversion Value': 6, 'start': 5.14, 'end': 8.82},
 {'Conversion Value': 7, 'start': 8.82, 'end': 11.16},
 {'Conversion Value': 8, 'start': 11.16, 'end': 12.8},
 {'Conversion Value': 9, 'start': 12.8, 'end': 23.2},
 {'Conversion Value': 10, 'start': 23.2, 'end': 76.78},
 {'Conversion Value': 11, 'start': 76.78, 'end': 123.2},
 {'Conversion Value': 12, 'start': 123.2, 'end': 276.76},
 {'Conversion Value': 13, 'start': 276.76, 'end': 823.24},
 {'Conversion Value': 14, 'start': 823.24, 'end': np.nan}])

And DF2

DF2 = pd.DataFrame({'Value': [0, 2, 13]})

I want to join these, or create a new field on DF2 called 'bucket' so that If Value = 0, then I want 0 from DF1. If Value = 2 then I want 3 from DF1. If value = 13 then I want 9 from DF1. And so forth.

I tried to create an index on DF1 like so:

idx = pd.IntervalIndex.from_arrays(DF1['Start'], DF1['stop']) 

"message": "missing values must be missing in the same location both left and right sides",

How can I create an index where the start and end values contain NaN on the end field?

CodePudding user response:

for the between join, pysql is a good choice. We can solve it with the merge too, but that will be a longer solution


# https://pypi.org/project/pandasql/
pysqldf = lambda q: sqldf(q, globals())


qry = """
select *
from df1, df2
where df2.Value  between df1.start and df1.end
"""
pysqldf = lambda q: sqldf(q, globals())
result=pysqldf(qry)
result
   Conversion Value  start    end  Value
0                 1    0.0   1.70      0
1                 2    1.7   2.28      2
2                 9   12.8  23.20     13

CodePudding user response:

You can use pd.merge_asof to perform this type of merge.

# Converted to float, because it didn't like when the dtypes weren't matching.
result = pd.merge_asof(df2.astype(float), df, left_on='Value', right_on='start')
print(result)

Output:

   Value  Conversion Value  start    end
0    0.0                 1    0.0   1.70
1    2.0                 2    1.7   2.28
2   13.0                 9   12.8  23.20
  • Related