So I have a dataframe (Df1) of say 100 rows. I also have a dataframe (df2) which contains 1 row. For each data frame, a row represents a property. The columns in question are 'NIY' and 'Purchase Price'. I want to write a function that queries DF1 to show results where the NIY is within 0.1 of DF2 and the purchase price is within 2,000,000 of DF2.
DF1 contains multiple columns, of which contain 'NIY' & 'Purchase Price'.
Purchase_Price | NIY | Purchaser |
---|---|---|
230000000 | 0.15 | GS |
30000000 | 0.08 | JKK |
See code below
df1 = pd.read_csv('CopyTransactions.csv', header=3)
df2 = {'Purchase_Price': [25300000], 'NIY': [0.078]}
df2 = pd.DataFrame(data=df2)
df1.query('Purchase_Price > 1000000 & NIY > 0.09')
df1.query('NIY within range g.0['NIY'] or - 0.01
& Purchase_price within range g.0['Purchase_Price'] or - 2,000,000)
The code works until the bottom part, I want to work out how to correctly code the bottom section. (the 0 in the bottom code means index 0/first row)
Thanks!
CodePudding user response:
It's just a matter of filtering/slicing your dataframe.
I would store the values from df2 as a variable. Then could use pandas
.between()
import pandas as pd
df1 = {'Purchase_Price': [23000000, 30000000, 26300000],
'NIY': [0.15, 0.08, 0.069],
'Purchaser': ['GS','JKK','JJK']}
df1 = pd.DataFrame(data=df1)
df2 = {'Purchase_Price': [25300000], 'NIY': [0.078]}
df2 = pd.DataFrame(data=df2)
price_check = df2.iloc[0]['Purchase_Price']
niy_check = df2.iloc[0]['NIY']
price = 2000000
niy = 0.01
df = df1[(df1['Purchase_Price'].between(price_check - price, price_check price))
& (df1['NIY'].between(niy_check - niy, niy_check niy))]
Output:
print(df)
Purchase_Price NIY Purchaser
2 26300000 0.069 JJK
CodePudding user response:
With query
, you can try:
>>> df1.query("@df2.Purchase_Price.iat[0]-2000000 <= Purchase_Price <= @df2.Purchase_Price.iat[0] 2000000 and @df2.NIY.iat[0]-0.1 <= NIY <= @df2.NIY.iat[0] 0.1")
Purchase_Price NIY Purchaser
2 23500000 0.08 FOO