Home > OS >  How do I get index of a specific value (in second dataframe) based on the same value in first datafr
How do I get index of a specific value (in second dataframe) based on the same value in first datafr

Time:12-28

I have 2 data frames, df_ts and df_cmexport. I am trying to get the index of placement id in df_cmexport for the placements in df_ts Refer to get an idea of the explanation : Click here to view excel file

Once I have the index of those placement id's as a list, I will iterate through them using for j in list_pe_ts_1: to get some value for 'j' index as such : df_cmexport['p_start_year'][j].

My code below returns an empty list for some reason print(list_pe_ts_1) returns []

I think something wrong with list_pe_ts_1 = df_cmexport.index[df_cmexport['Placement ID'] == pid_1].tolist() as this returens empty list of length 0

I even tried using list_pe_ts_1 = df_cmexport.loc[df_cmexport.isin([pid_1]).any(axis=1)].index but still gives a empty list

Help is always appreciated :) Cheers to you all @stackoverflow

for i in range(0, len(df_ts)):
    pid_1 = df_ts['PLACEMENT ID'][i]
    print('for pid ', pid_1)
    list_pe_ts_1 = df_cmexport.index[df_cmexport['Placement ID'] == pid_1].tolist()
    print('len of list',len(list_pe_ts_1))
    ts_p_start_year_for_pid = df_ts['p_start_year'][i]
    ts_p_start_month_for_pid = df_ts['p_start_month'][i]
    ts_p_start_day_for_pid = df_ts['p_start_date'][i]

    print('\np_start_full_date_ts for :', pid_1, 'y:', ts_p_start_year_for_pid, 'm:', ts_p_start_month_for_pid,
          'd:', ts_p_start_day_for_pid)
    # j=list_pe_ts
    print(list_pe_ts_1)
    for j in list_pe_ts_1:
        # print(j)

        export_p_start_year_for_pid = df_cmexport['p_start_year'][j]
        export_p_start_month_for_pid = df_cmexport['p_start_month'][j]
        export_p_start_day_for_pid = df_cmexport['p_start_date'][j]
        print('\np_start_full_date_export for ', pid, "at row(", j, ") :", export_p_start_year_for_pid,
              export_p_start_month_for_pid, export_p_start_day_for_pid)
    if (ts_p_start_year_for_pid == export_p_start_year_for_pid) and (
            ts_p_start_month_for_pid == export_p_start_month_for_pid) and (
            ts_p_start_day_for_pid == export_p_start_day_for_pid):
        pids_p_1.add(pid_1)
        # print('pass',pids_p_1)

        # print(export_p_end_year_for_pid)
    else:
        pids_f_1.add(pid_1)
        # print("mismatch in placement end date for pid ", pids)
        # print("pids list ",pids)
        # print('fail',pids_f_1)

CodePudding user response:

With below snippest you can get a list of the matching index field from seconds dataframe.

import pandas as pd
df_ts = pd.DataFrame(data = {'index in df':[0,1,2,3,4,5,6,7,8,9,10,11,12],
                                   "pid":[1,1,2,2,3,3,3,4,6,8,8,9,9],
                            })

df_cmexport = pd.DataFrame(data = {'index in df':[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
                                  "pid":[1,1,1,2,3,3,3,3,3,4,4,4,5,5,6,7,8,8,9,9,9],
                            })

Create new dataframe by mearging the two

result = pd.merge(df_ts, df_cmexport, left_on=["pid"], right_on=["pid"], how='left', indicator='True', sort=True)

Then identify unique values in "index in df_y" dataframe

index_list = result["index in df_y"].unique()

The result you get;

index_list
Out[9]: 
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 14, 16, 17, 18, 19,
       20], dtype=int64)
  • Related