I'll admit that this question is quite specific. I'm trying to write a function that reads two time columns (same label) in separate dataframes df1['gps']
and df2['gps']
. I want to look for elements in the first column which are close to those in the second column, not necessarily in same row. When the condition on time distance is met, I want to save the close elements in df1['gps']
and df1['gps']
in a new dataframe called coinc
in separate columns coinc['gps1']
and coinc['gps2']
in the fastest and most efficient way. This is my code:
def find_coinc(df1, df2=None, tdelta=.25, shift=0):
index_boolean = False
if df2 is None:
df2 = df1.copy()
coincs = pd.DataFrame()
for _, r1 in tqdm(df1.iterrows(), total=len(df1)):
ctrig = df2.loc[abs(r1.gps shift-df2.gps)<tdelta]
print(r1.gps)
coincs_single = pd.DataFrame()
if len(ctrig)>0:
coincs_single['gps1'] = r1.gps
coincs_single['gps2'] = ctrig.gps
coincs = pd.concat((coincs, coincs_single), axis = 0, ignore_index=index_boolean)
index_boolean=True
else:
pass
return coincs
The script runs fine, but when investigating the output, I find that one column of coinc
is all NaN and I don't understand why. Test case with generated data:
a = pd.DataFrame() #define dataframes and fill them
b = pd.DataFrame()
a['gps'] = [0.12, 0.13, 0.6, 0.7]
b['gps'] = [0.1, 0.3, 0.5, 0.81, 0.82, 0.83]
find_coinc(a, b, 0.16, 0)
The output yielded is:
gps1 gps2
0 NaN 0.10
1 NaN 0.10
2 NaN 0.50
3 NaN 0.81
4 NaN 0.82
5 NaN 0.83
How can I write coinc
so that both columns turn out fine?
CodePudding user response:
Well, here is another solution. Instead of concat two dataframes just add new rows to 'coincs' DataFrame. I will show you below.
def find_coinc(df1, df2=None, tdelta=.25, shift=0):
if df2 is None:
df2 = df1.copy()
coincs = pd.DataFrame(columns=['gps1', 'gps2'])
for _, r1 in tqdm(df1.iterrows(), total=len(df1)):
ctrig = df2.loc[abs(r1.gps shift-df2.gps) < tdelta]
if len(ctrig)>0:
for ctrig_value in ctrig['gps']:
# Add n rows based on 'ctrig' length.
coincs.loc[len(coincs)] = [r1.gps, ctrig_value]
else:
pass
return coincs
# -------------------
a = pd.DataFrame() # define dataframes and fill them
b = pd.DataFrame()
a['gps'] = [0.12, 0.13, 0.6, 0.7]
b['gps'] = [0.1, 0.3, 0.5, 0.81, 0.82, 0.83]
coins = find_coinc(a, b, 0.16, 0)
print('\n\n')
print(coins.to_string())
Result:
gps1 gps2
0 0.12 0.10
1 0.13 0.10
2 0.60 0.50
3 0.70 0.81
4 0.70 0.82
5 0.70 0.83
I hope I could help! :D
CodePudding user response:
So the issue is that there are multiple elements in df2['gps']
which satisfy the condition of being within a time window of df1['gps']
. I think I found a solution, but looking for a better one if possible. Highlighting the modified line in the original function as ### FIX UPDATE comment:
def find_coinc(df1, df2=None, tdelta=.25, shift=0):
index_boolean = False
if df2 is None:
df2 = df1.copy()
coincs = pd.DataFrame()
for _, r1 in tqdm(df1.iterrows(), total=len(df1)):
ctrig = df2.loc[abs(r1.gps shift-df2.gps)<tdelta]
ctrig.reset_index(drop=True, inplace=True)
coincs_single = pd.DataFrame()
if len(ctrig)>0:
coincs_single['gps1'] = [r1.gps]*len(ctrig) ### FIX UPDATE
coincs_single['gps2'] = ctrig.gps
print(ctrig.gps)
coincs = pd.concat((coincs, coincs_single), axis = 0, ignore_index=index_boolean)
index_boolean=True
else:
pass
return coincs
The solution I chose, since I want to have all the instances of the condition being met, was to write the same element in df1['gps']
into coinc['gps1']
the needed amount of times.