Home > Back-end >  Python: Filtering a datastructure depended on columnvalue
Python: Filtering a datastructure depended on columnvalue

Time:08-05

I have two pandas dataframe structured like so:

DF1:
|'ID'|'Zone'|
|:---------:|
| 11 |  1   |
| 12 |  2   |
| 10 |  0   |


DF2:
|'ID'|'Time'|
|:---------:|
| 11 |   1  |
| 11 |   2  |
| 12 |   1  |
| 12 |   2  |

And I want to add a new column to DF2 named zone, that contain the correct value for which zone each ID belong to. See example below.

|'ID'|'Time'|'Zone'|
|:----------------:|
| 11 |   1  |   1  |
| 11 |   2  |   1  |
| 12 |   1  |   2  |
| 12 |   2  |   2  |

For this small example I have written some code that works fine, but I will like to use is on two large DF. So my qustion is, is there a more delicated (better) way to do this? My current code is:

df2 = np.empty([len(df2.index)]
for i in df2.index:
    for j in df.index:
        if df2['id'][i] == df1['id'][j]:
            df2.loc[i, 'zone'] = df1.loc[j, 'zone']

CodePudding user response:

df1.merge(df2, how='right')

or 

df2.merge(df1,how='left')

joining df2 with df1 using ID columns

CodePudding user response:

You need to use merge function to perform a join:

pd.merge(df1, df2, on="ID")

Where on="ID" indicates which is the reference column and must be present in both dataframes (if you want to join using columns with different names, it is also possible, just check the docs).

merge is also available as a dataframe's method so you can alternatively call it as:

df1.merge(df2, on="ID")

With exactly same result.

CodePudding user response:

Here's a nice one liner

df2["Zone"] = [df1.set_index("ID")["Zone"][df2["ID"][i]] for i in df2.index]

Setting the index of df1 to its 'ID' column allows the values of df2's 'ID' column to act as indices, making the call a little simpler.

  • Related