Please help me to create another column based on other dataframe value.
Here is my df example:
data = {'Well':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3],
'Depth':[50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210]}
data2 = {'Well': [1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3], 'Layer': ['A', 'Bot-A', 'B', 'Bot-B', 'C', 'Bot-C','A', 'Bot-A', 'B', 'Bot-B', 'C', 'Bot-C','A', 'Bot-A', 'B', 'Bot-B', 'C', 'Bot-C'], 'Depth': [75,100,125,150,175,200,61,78,89,141,152,189,50,68,98,135,155,189]}
df2 = pd.DataFrame(data2)
df = pd.DataFrame(data)
The condition is, I need to fill the Layer to df in every depth.
For example, in df2, we can found that Layer A at Well 1 is 75 and the Bottom-A is at Depth 100. So if the Depth in df1 is bigger than 75 but less than 100, I want to fill them with A.
Next condition is, If the Depth in df1 < Depth A in df2 for every Well, I need to fill them with GG. And if the Depth > Bot-C, I need to fill them with JJ.
This is my desired result.
data3 = {'Well':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3],
'Depth':[50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210],
'Layer': ['GG','GG','GG','A','A','Bot-A','Bot-A','Bot-A','B','B','Bot-B','Bot-B','Bot-B','C','C','Bot-C','JJ','GG','GG','GG','A','Bot-A','B','B','B','B','B','B','Bot-B','C','C','C','JJ','JJ','A','A','Bot-A','Bot-A','Bot-A','B','B','B','B','Bot-B','Bot-B','C','C','C','JJ','JJ','JJ']}
df3 = pd.DataFrame(data3)
Thank you very much and Happy New Year!
CodePudding user response:
Here's one solution. You can create a lookup dictionary from df2
and with the help of a custom function get_layer
that maps depths to layers using the lookup dictionary, you can get the expected outcome.
def get_layer(data):
depths = lookup[data['Well']]
if data['Depth'] < list(depths.keys())[-1]:
return 'GG'
elif data['Depth'] > list(depths.keys())[0]:
return 'JJ'
else:
for lim, layer in depths.items():
if data['Depth'] >= lim:
return layer
lookup = {}
for (k1,k2),v in df2.set_index(['Well','Depth'])['Layer'][::-1].to_dict().items():
lookup.setdefault(k1, {}).update({k2:v})
df['Layer'] = df.apply(get_layer, axis=1)
Output:
Well Depth Layer
0 1 50 GG
1 1 60 GG
2 1 70 GG
3 1 80 A
4 1 90 A
5 1 100 Bot-A
6 1 110 Bot-A
7 1 120 Bot-A
8 1 130 B
9 1 140 B
10 1 150 Bot-B
11 1 160 Bot-B
12 1 170 Bot-B
13 1 180 C
14 1 190 C
15 1 200 Bot-C
16 1 210 JJ
17 2 40 GG
18 2 50 GG
19 2 60 GG
20 2 70 A
21 2 80 Bot-A
22 2 90 B
23 2 100 B
24 2 110 B
25 2 120 B
26 2 130 B
27 2 140 B
28 2 150 Bot-B
29 2 160 C
30 2 170 C
31 2 180 C
32 2 190 JJ
33 2 200 JJ
34 3 50 A
35 3 60 A
36 3 70 Bot-A
37 3 80 Bot-A
38 3 90 Bot-A
39 3 100 B
40 3 110 B
41 3 120 B
42 3 130 B
43 3 140 Bot-B
44 3 150 Bot-B
45 3 160 C
46 3 170 C
47 3 180 C
48 3 190 JJ
49 3 200 JJ
50 3 210 JJ
CodePudding user response:
use merge_asof,
df = df.sort_values("Depth")
df2 = df2.sort_values("Depth")
dfmerge = pd.merge_asof(df, df2, by="Well",on = "Depth").fillna("GG")
dfmerge = pd.merge_asof(dfmerge, df2, by = "Well", on="Depth", direction ="forward")
dfmerge.loc[dfmerge["Layer_y"].isnull(), "Layer_x"] = "JJ"
dfmerge = dfmerge.sort_values(["Well", "Depth"]).drop(columns = "Layer_y")
CodePudding user response:
Since you are working with well and geology layers you would probably need conditional joins in future. There is no direct way to do conditional join in pandas, however you can use pandasql library.
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
query = '''
with
q1 as (
select *,
lead(Depth) over(partition by Well order by Depth) as Depth_lead,
ifnull(lag(Depth) over(partition by Well order by Depth), 0) as Depth_lag
from df2 )
,q2 as (select Well, min(Depth) as min, max(Depth) as max from q1 group by Well )
, q3 as (select t1.*, (CASE WHEN t2.Layer is null and t1.Depth < min THEN 'GG'
WHEN t2.Layer is null and t1.Depth > max THEN 'JJ'
ELSE t2.Layer END) as Layer
from df as t1
left join q1 as t2
on t1.Well = t2.Well and t1.Depth>=t2.Depth and t1.Depth < t2.Depth_lead
left join q2 as t3 on t1.well = t3.well)
Select * from q3
'''
data = sqldf(query)