I have heard that using two for loops within one another takes a long time. However, I'm not sure how to get around this issue. Each of my for loops are ~40,000 loops, so the time is very long. I am basically trying to compare entries of two DataFrames. If three of the entries (frame, x, and y) in a certain row match between the two, then I want to save that to a new array of data. My end result is to have a dataframe with frame, x, y, particle #, and two different intensities). I believe what I am doing will work, but it takes over 5 hours to get a result. Is there anyway I can make this code quicker and more efficient? Thank you so much, my code is posted below.
intensity = np.zeros((tracks.shape[0], 2))
location = np.zeros((tracks.shape[0], 2))
frame = np.zeros((tracks.shape[0], 1))
particle = np.zeros((tracks.shape[0], 1))
for r in range(red_masked_tracks.shape[0]):
for g in range(green_masked_tracks.shape[0]):
if red_masked_tracks['frame'][r] == green_masked_tracks['frame'][g]:
if round(red_masked_tracks['x'][r]) == round(green_masked_tracks['x'][g]) and round(red_masked_tracks['y'][r]) == round(green_masked_tracks['y'][g]):
intensity[g] = [red_masked_tracks['mass'][r], green_masked_tracks['mass'][g]]
location[g] = [red_masked_tracks['x'][r], red_masked_tracks['y'][r]]
frame[g] = red_masked_tracks['frame'][r]
particle[g] = red_masked_tracks['particle'][r]
break
CodePudding user response:
Use an inner merge to filter rows where x
, y
and frame
match.
# example data
red = pd.DataFrame({"x": [1.2, 3.3, 5.4, 1], "y": [1.1, 10.7, 9.3, 1.1], "frame": ["a", "b", "c", "a"], "mass": [10, 20, 30, 40]})
green = pd.DataFrame({"x": [1.2, 2.3, 3.4, 1], "y": [1.1, 4.7, 6.3, 1.1], "frame": ["a", "b", "c", "a"], "mass": [50, 60, 70, 80]})
# Add rounded versions of x and y
for df in [red, green]:
df["xr"] = df["x"].round()
df["yr"] = df["y"].round()
matches = pd.merge(red, green, on=["xr", "yr", "frame"], suffixes=["_red", "_green"])
There can in principle be multiple matches in red to a particular row in green, and vice versa. If so you will have to decide what to do with these, e.g. keep only the first of each group, using matches = matches.groupby(["xr", "yr", "frame"]).agg("first")
.
You can work with the resulting dataframe directly, or convert it to numpy arrays:
intensity = matches[["mass_red", "mass_green"]].to_numpy()
location = matches[["x_red", "y_red"]].to_numpy()
frame = matches["frame_red"].to_numpy()
particle = matches["particle_red"].to_numpy()