So my question here is how can I add data in new column to dataframe based on conditions from another dataframe. It is kinda difficult to say it so I am giving an example here
df1
columns a b c
0 10 1
10 15 3
15 20 5
df2
columns d e
3.3 10
5.5 20
14.5 11
17.2 5
What I want to do here is to add another column f to df2, and its value is from df1 such that if d[i] is between a[j] and b[j], then copy the value c[j] to the new column f[i] in df2. for example: d[1] = 5.5 so 0< 5.5< 10 hence, the value of f[1] = c[0] = 1
the final results should look like
df2
columns d e f
3.3 10 1
5.5 20 1
14.5 11 3
17.2 5 5
Any help is greatly appreciated!
Regards,
Steve
CodePudding user response:
Assuming non-overlapping intervals in df1
a and b, you can use pd.cut
with a pd.IntervalIndex
:
import pandas as pd
# Your dfs here
df1 = pd.read_clipboard()
df2 = pd.read_clipboard()
idx = pd.IntervalIndex.from_arrays(df1["a"], df1["b"])
mapping = df1["c"].set_axis(idx)
df2["f"] = pd.cut(df2["d"], idx).map(mapping)
df2:
d e f
0 3.3 10 1
1 5.5 20 1
2 14.5 11 3
3 17.2 5 5
CodePudding user response:
You could use:
result = []
for item in df2['d']:
for row in df1.iterrows():
if row[1]['a'] <= item <= row[1]['b']:
val = (row[1]['c'])
break
else:
val = None
result.append(val)
df2['f'] = result
print(df2)
CodePudding user response:
import pandas as pd
df1 = pd.DataFrame({'a':[0,10,15],'b':[10,15,20],'c':[1,3,5]})
df2 = pd.DataFrame({'d':[3.3,5.5,9.5,17.2],'e':[10,20,11,5]})
df2['f']=0
for i in range(df2.shape[0]):
for j in range(df1.shape[0]):
if df2.d[i]>=df1.a[j] and df2.d[i]<=df1.b[j]:
df2.f[i]=df1.c[j]
df2
CodePudding user response:
What about this option ?
# merge the two dfs
df = pd.merge(df2, df1, left_on='d', right_on='b', how='left')
df2['f'] = None
df2['f'] = df.apply(lambda x: x['c'] if x['a_x'] <= x['d'] <= x['b_x'] else None, axis=1)