I have 2 data frames, df_ts
and df_cmexport
. I am trying to get the index
of placement id
in df_cmexport
for the placements in df_ts
Refer to get an idea of the explanation : Click here to view excel file
Once I have the index of those placement id's as a list, I will iterate through them using for j in list_pe_ts_1:
to get some value for 'j' index as such : df_cmexport['p_start_year'][j]
.
My code below returns an empty list for some reason print(list_pe_ts_1)
returns []
I think something wrong with list_pe_ts_1 = df_cmexport.index[df_cmexport['Placement ID'] == pid_1].tolist()
as this returens empty list of length 0
I even tried using list_pe_ts_1 = df_cmexport.loc[df_cmexport.isin([pid_1]).any(axis=1)].index
but still gives a empty list
Help is always appreciated :) Cheers to you all @stackoverflow
for i in range(0, len(df_ts)):
pid_1 = df_ts['PLACEMENT ID'][i]
print('for pid ', pid_1)
list_pe_ts_1 = df_cmexport.index[df_cmexport['Placement ID'] == pid_1].tolist()
print('len of list',len(list_pe_ts_1))
ts_p_start_year_for_pid = df_ts['p_start_year'][i]
ts_p_start_month_for_pid = df_ts['p_start_month'][i]
ts_p_start_day_for_pid = df_ts['p_start_date'][i]
print('\np_start_full_date_ts for :', pid_1, 'y:', ts_p_start_year_for_pid, 'm:', ts_p_start_month_for_pid,
'd:', ts_p_start_day_for_pid)
# j=list_pe_ts
print(list_pe_ts_1)
for j in list_pe_ts_1:
# print(j)
export_p_start_year_for_pid = df_cmexport['p_start_year'][j]
export_p_start_month_for_pid = df_cmexport['p_start_month'][j]
export_p_start_day_for_pid = df_cmexport['p_start_date'][j]
print('\np_start_full_date_export for ', pid, "at row(", j, ") :", export_p_start_year_for_pid,
export_p_start_month_for_pid, export_p_start_day_for_pid)
if (ts_p_start_year_for_pid == export_p_start_year_for_pid) and (
ts_p_start_month_for_pid == export_p_start_month_for_pid) and (
ts_p_start_day_for_pid == export_p_start_day_for_pid):
pids_p_1.add(pid_1)
# print('pass',pids_p_1)
# print(export_p_end_year_for_pid)
else:
pids_f_1.add(pid_1)
# print("mismatch in placement end date for pid ", pids)
# print("pids list ",pids)
# print('fail',pids_f_1)
CodePudding user response:
With below snippest you can get a list of the matching index field from seconds dataframe.
import pandas as pd
df_ts = pd.DataFrame(data = {'index in df':[0,1,2,3,4,5,6,7,8,9,10,11,12],
"pid":[1,1,2,2,3,3,3,4,6,8,8,9,9],
})
df_cmexport = pd.DataFrame(data = {'index in df':[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
"pid":[1,1,1,2,3,3,3,3,3,4,4,4,5,5,6,7,8,8,9,9,9],
})
Create new dataframe by mearging the two
result = pd.merge(df_ts, df_cmexport, left_on=["pid"], right_on=["pid"], how='left', indicator='True', sort=True)
Then identify unique values in "index in df_y" dataframe
index_list = result["index in df_y"].unique()
The result you get;
index_list
Out[9]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 14, 16, 17, 18, 19,
20], dtype=int64)