I read this post, and this one.
I was unable to make use of the examples there. I think perhaps because I have NaN values at either end. Here's what my dataframe looks like:
DF1 = pd.DataFrame([{'Conversion Value': 0, 'start': 0.0, 'end': np.nan},
{'Conversion Value': 1, 'start': 0.0, 'end': 1.7},
{'Conversion Value': 2, 'start': 1.7, 'end': 2.28},
{'Conversion Value': 3, 'start': 2.28, 'end': 3.12},
{'Conversion Value': 4, 'start': 3.12, 'end': 4.84},
{'Conversion Value': 5, 'start': 4.84, 'end': 5.14},
{'Conversion Value': 6, 'start': 5.14, 'end': 8.82},
{'Conversion Value': 7, 'start': 8.82, 'end': 11.16},
{'Conversion Value': 8, 'start': 11.16, 'end': 12.8},
{'Conversion Value': 9, 'start': 12.8, 'end': 23.2},
{'Conversion Value': 10, 'start': 23.2, 'end': 76.78},
{'Conversion Value': 11, 'start': 76.78, 'end': 123.2},
{'Conversion Value': 12, 'start': 123.2, 'end': 276.76},
{'Conversion Value': 13, 'start': 276.76, 'end': 823.24},
{'Conversion Value': 14, 'start': 823.24, 'end': np.nan}])
And DF2
DF2 = pd.DataFrame({'Value': [0, 2, 13]})
I want to join these, or create a new field on DF2 called 'bucket' so that If Value = 0, then I want 0 from DF1. If Value = 2 then I want 3 from DF1. If value = 13 then I want 9 from DF1. And so forth.
I tried to create an index on DF1 like so:
idx = pd.IntervalIndex.from_arrays(DF1['Start'], DF1['stop'])
"message": "missing values must be missing in the same location both left and right sides",
How can I create an index where the start and end values contain NaN on the end field?
CodePudding user response:
for the between join, pysql is a good choice. We can solve it with the merge too, but that will be a longer solution
# https://pypi.org/project/pandasql/
pysqldf = lambda q: sqldf(q, globals())
qry = """
select *
from df1, df2
where df2.Value between df1.start and df1.end
"""
pysqldf = lambda q: sqldf(q, globals())
result=pysqldf(qry)
result
Conversion Value start end Value
0 1 0.0 1.70 0
1 2 1.7 2.28 2
2 9 12.8 23.20 13
CodePudding user response:
You can use pd.merge_asof
to perform this type of merge.
# Converted to float, because it didn't like when the dtypes weren't matching.
result = pd.merge_asof(df2.astype(float), df, left_on='Value', right_on='start')
print(result)
Output:
Value Conversion Value start end
0 0.0 1 0.0 1.70
1 2.0 2 1.7 2.28
2 13.0 9 12.8 23.20