Home > Software engineering >  Create new column based on other column value in python
Create new column based on other column value in python

Time:12-31

Please help me to create another column based on other dataframe value.

Here is my df example:

data = {'Well':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3], 
  'Depth':[50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210]}
data2 = {'Well': [1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3], 'Layer': ['A', 'Bot-A', 'B', 'Bot-B', 'C', 'Bot-C','A', 'Bot-A', 'B', 'Bot-B', 'C', 'Bot-C','A', 'Bot-A', 'B', 'Bot-B', 'C', 'Bot-C'], 'Depth': [75,100,125,150,175,200,61,78,89,141,152,189,50,68,98,135,155,189]}
df2 = pd.DataFrame(data2)
df = pd.DataFrame(data)

The condition is, I need to fill the Layer to df in every depth.

For example, in df2, we can found that Layer A at Well 1 is 75 and the Bottom-A is at Depth 100. So if the Depth in df1 is bigger than 75 but less than 100, I want to fill them with A.

Next condition is, If the Depth in df1 < Depth A in df2 for every Well, I need to fill them with GG. And if the Depth > Bot-C, I need to fill them with JJ.

This is my desired result.

data3 = {'Well':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3], 
  'Depth':[50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210], 
  'Layer': ['GG','GG','GG','A','A','Bot-A','Bot-A','Bot-A','B','B','Bot-B','Bot-B','Bot-B','C','C','Bot-C','JJ','GG','GG','GG','A','Bot-A','B','B','B','B','B','B','Bot-B','C','C','C','JJ','JJ','A','A','Bot-A','Bot-A','Bot-A','B','B','B','B','Bot-B','Bot-B','C','C','C','JJ','JJ','JJ']}
df3 = pd.DataFrame(data3)

Thank you very much and Happy New Year!

CodePudding user response:

Here's one solution. You can create a lookup dictionary from df2 and with the help of a custom function get_layer that maps depths to layers using the lookup dictionary, you can get the expected outcome.

def get_layer(data):
    depths = lookup[data['Well']]
    if data['Depth'] < list(depths.keys())[-1]:
        return 'GG'
    elif data['Depth'] > list(depths.keys())[0]:
        return 'JJ'
    else:
        for lim, layer in depths.items():
            if data['Depth'] >= lim:
                return layer
            
lookup = {}
for (k1,k2),v in df2.set_index(['Well','Depth'])['Layer'][::-1].to_dict().items():
    lookup.setdefault(k1, {}).update({k2:v})

df['Layer'] = df.apply(get_layer, axis=1)

Output:

    Well  Depth  Layer
0      1     50     GG
1      1     60     GG
2      1     70     GG
3      1     80      A
4      1     90      A
5      1    100  Bot-A
6      1    110  Bot-A
7      1    120  Bot-A
8      1    130      B
9      1    140      B
10     1    150  Bot-B
11     1    160  Bot-B
12     1    170  Bot-B
13     1    180      C
14     1    190      C
15     1    200  Bot-C
16     1    210     JJ
17     2     40     GG
18     2     50     GG
19     2     60     GG
20     2     70      A
21     2     80  Bot-A
22     2     90      B
23     2    100      B
24     2    110      B
25     2    120      B
26     2    130      B
27     2    140      B
28     2    150  Bot-B
29     2    160      C
30     2    170      C
31     2    180      C
32     2    190     JJ
33     2    200     JJ
34     3     50      A
35     3     60      A
36     3     70  Bot-A
37     3     80  Bot-A
38     3     90  Bot-A
39     3    100      B
40     3    110      B
41     3    120      B
42     3    130      B
43     3    140  Bot-B
44     3    150  Bot-B
45     3    160      C
46     3    170      C
47     3    180      C
48     3    190     JJ
49     3    200     JJ
50     3    210     JJ

CodePudding user response:

use merge_asof,

df = df.sort_values("Depth")
df2 = df2.sort_values("Depth")
dfmerge = pd.merge_asof(df, df2,  by="Well",on = "Depth").fillna("GG")
dfmerge = pd.merge_asof(dfmerge, df2, by = "Well", on="Depth", direction ="forward")
dfmerge.loc[dfmerge["Layer_y"].isnull(), "Layer_x"] = "JJ"
dfmerge = dfmerge.sort_values(["Well", "Depth"]).drop(columns = "Layer_y")

CodePudding user response:

Since you are working with well and geology layers you would probably need conditional joins in future. There is no direct way to do conditional join in pandas, however you can use pandasql library.

from pandasql import sqldf

pysqldf = lambda q: sqldf(q, globals())
query = ''' 
with 
q1 as (
select *,
lead(Depth) over(partition by Well order by Depth) as Depth_lead,
ifnull(lag(Depth) over(partition by Well order by Depth), 0) as Depth_lag        
       from df2 )
,q2 as (select Well, min(Depth) as min, max(Depth) as max from q1 group by Well )

, q3 as (select t1.*, (CASE WHEN t2.Layer is null and t1.Depth < min THEN 'GG'
                       WHEN t2.Layer is null and t1.Depth  > max THEN 'JJ'
                       ELSE t2.Layer END) as Layer
    from df as t1
    left join q1 as t2 
    on t1.Well = t2.Well and t1.Depth>=t2.Depth and t1.Depth < t2.Depth_lead
    left join q2 as t3 on t1.well = t3.well)

   Select * from q3
   '''
   data = sqldf(query)
  • Related