Home > Enterprise >  Match and store based on two columns in python dataframe
Match and store based on two columns in python dataframe

Time:08-06

I'm working on a project that requires me to match two data frames based on two separate columns, X and Y.

e.g.

df1 =

|  X  |  Y  | AGE |
|:--- |:---:|----:|
| 2.0 | 1.5 | 25  |
| 1.0 | 0.5 | 29  |
| 1.5 | 0.5 | 21  |
| 2.0 | 2.0 | 32  |
| 0.0 | 1.5 | 19  |

df2 =

|  X  |  Y  | AGE |
|:--- |:---:|----:|
| 0.0 | 0.0 | []  |
| 0.0 | 0.5 | []  |
| 0.0 | 1.0 | []  |
| 0.0 | 1.5 | []  |
| 0.0 | 2.0 | []  |
| 0.5 | 0.0 | []  |
| 0.5 | 0.5 | []  |
| 0.5 | 1.0 | []  |
| 0.5 | 1.5 | []  |
| 0.5 | 2.0 | []  |
| 1.0 | 0.0 | []  |
| 1.0 | 0.5 | []  |
| 1.0 | 1.0 | []  |
| 1.0 | 1.5 | []  |
| 1.0 | 2.0 | []  |
| 1.5 | 0.0 | []  |
| 1.5 | 0.5 | []  |
| 1.5 | 1.0 | []  |
| 1.5 | 1.5 | []  |
| 1.5 | 2.0 | []  |
| 2.0 | 0.0 | []  |
| 2.0 | 0.5 | []  |
| 2.0 | 1.0 | []  |
| 2.0 | 1.5 | []  |
| 2.0 | 2.0 | []  |

The goal is to sort through df1, find the row with its matching coordinates in df2, and then store the AGE value from df1 in the AGE list in df2. The expected output would be:

df2 =

|  X  |  Y  | AGE |
|:--- |:---:|----:|
| 0.0 | 0.0 | []  |
| 0.0 | 0.5 | []  |
| 0.0 | 1.0 | []  |
| 0.0 | 1.5 |[19] |
| 0.0 | 2.0 | []  |
| 0.5 | 0.0 | []  |
| 0.5 | 0.5 | []  |
| 0.5 | 1.0 | []  |
| 0.5 | 1.5 | []  |
| 0.5 | 2.0 | []  |
| 1.0 | 0.0 | []  |
| 1.0 | 0.5 |[29] |
| 1.0 | 1.0 | []  |
| 1.0 | 1.5 | []  |
| 1.0 | 2.0 | []  |
| 1.5 | 0.0 | []  |
| 1.5 | 0.5 |[21] |
| 1.5 | 1.0 | []  |
| 1.5 | 1.5 | []  |
| 1.5 | 2.0 | []  |
| 2.0 | 0.0 | []  |
| 2.0 | 0.5 | []  |
| 2.0 | 1.0 | []  |
| 2.0 | 1.5 |[25] |
| 2.0 | 2.0 |[32] |

The code I have so far is:

for n in df1:
    if df1["X"].values[n] == df2["X"].values[n]:
        for m in df1:
            if df1["Y"].values[m]) == df2["Y"].values[m]:
                df2['AGE'].push(df1['AGE'])

CodePudding user response:

This is a merge operation that can be solved by not considering the Age column from df2, and merging left on ['X','Y']:

df2 = df2[['X','Y']].merge(df1,on=['X','Y'],how='left')

CodePudding user response:

To store age in lists:

df2 = df2[['X','Y']].merge(df1, on=['X','Y'], how='left')
df2['AGE'] = df2.apply(lambda row: [row['AGE']] if not pd.isnull(row['AGE']) else [], axis=1)
  • Related