Home > Net >  get values for potentially multiple matches from an other dataframe
get values for potentially multiple matches from an other dataframe

Time:04-24

I want to fill the 'references' column in df_out with the 'ID' if the corresponding 'my_ID' in df_sp is contained in df_jira 'reference_ids'.

import pandas as pd

d_sp = {'ID': [1,2,3,4], 'my_ID': ["my_123", "my_234", "my_345", "my_456"], 'references':["","","2",""]}
df_sp = pd.DataFrame(data=d_sp)

d_jira = {'my_ID': ["my_124", "my_235", "my_346"], 'reference_ids': ["my_123, my_234", "", "my_345"]}
df_jira = pd.DataFrame(data=d_jira)

df_new = df_jira[~df_jira["my_ID"].isin(df_sp["my_ID"])].copy()

df_out = pd.DataFrame(columns=df_sp.columns)
needed_cols = list(set(df_sp.columns).intersection(df_new.columns))
for column in needed_cols:
    df_out[column] = df_new[column]
df_out['Related elements_my'] = df_jira['reference_ids']

Desired output df_out:

| ID | my_ID | references |
|----|-------|------------|
|    | my_124|       1, 2 |
|    | my_235|            |
|    | my_346|          3 |

What I tried so far is list comprehension, but I only managed to get the reference_ids "copied" from a helper column to my 'references' column with this:

for row, entry in df_out.iterrows():
    cpl_ids = [x for x in entry['Related elements_my'].split(', ') if any(vh_id == x for vh_id in df_cpl_list['my-ID'])]
    df_out.at[row, 'Related elements'] = ', '.join(cpl_ids)

I can not wrap my head around on how to get the specific 'ID's on the matches of 'any()' or if this actually the way to go as I need all the matches, not something if there is any match. Any hints are appreciated!

I work with python 3.9.4 on Windows (adding in case python 3.10 has any other solution)

Backstory: Moving data from Jira to MS SharePoint lists. (Therefore, the 'ID' does not equal the actual index in the dataframe, but is rather assigned by SharePoint upon insertion into the list. Hence, empty after running for the new entries.)

CodePudding user response:

ref_df = df_sp[["ID","my_ID"]].set_index("my_ID")
df_out.references = df_out["Related elements_my"].apply(lambda x: ",".join(list(map(lambda y: "" if y == "" else str(ref_df.loc[y.strip()].ID), x.split(",")))))
df_out[["ID","my_ID","references"]]

output:

    ID  my_ID   references
0   NaN my_124  1,2
1   NaN my_235  
2   NaN my_346  3

if I understood your mean, correctly :)

  • Related