Is there a more efficient way to use two for loops in python?-CodePudding

I have heard that using two for loops within one another takes a long time. However, I'm not sure how to get around this issue. Each of my for loops are ~40,000 loops, so the time is very long. I am basically trying to compare entries of two DataFrames. If three of the entries (frame, x, and y) in a certain row match between the two, then I want to save that to a new array of data. My end result is to have a dataframe with frame, x, y, particle #, and two different intensities). I believe what I am doing will work, but it takes over 5 hours to get a result. Is there anyway I can make this code quicker and more efficient? Thank you so much, my code is posted below.

intensity = np.zeros((tracks.shape[0], 2))
location = np.zeros((tracks.shape[0], 2))
frame = np.zeros((tracks.shape[0], 1))
particle = np.zeros((tracks.shape[0], 1))

for r in range(red_masked_tracks.shape[0]):
    for g in range(green_masked_tracks.shape[0]):
        if red_masked_tracks['frame'][r] == green_masked_tracks['frame'][g]:
            if round(red_masked_tracks['x'][r]) == round(green_masked_tracks['x'][g]) and round(red_masked_tracks['y'][r]) == round(green_masked_tracks['y'][g]):
                intensity[g] = [red_masked_tracks['mass'][r], green_masked_tracks['mass'][g]]
                location[g] = [red_masked_tracks['x'][r], red_masked_tracks['y'][r]]
                frame[g] = red_masked_tracks['frame'][r]
                particle[g] = red_masked_tracks['particle'][r]
                break

CodePudding user response：

Use an inner merge to filter rows where x, y and frame match.

# example data
red = pd.DataFrame({"x": [1.2, 3.3, 5.4, 1], "y": [1.1, 10.7, 9.3, 1.1], "frame": ["a", "b", "c", "a"], "mass": [10, 20, 30, 40]})
green = pd.DataFrame({"x": [1.2, 2.3, 3.4, 1], "y": [1.1, 4.7, 6.3, 1.1], "frame": ["a", "b", "c", "a"], "mass": [50, 60, 70, 80]})

# Add rounded versions of x and y
for df in [red, green]:
    df["xr"] = df["x"].round()
    df["yr"] = df["y"].round()

matches = pd.merge(red, green, on=["xr", "yr", "frame"], suffixes=["_red", "_green"])

There can in principle be multiple matches in red to a particular row in green, and vice versa. If so you will have to decide what to do with these, e.g. keep only the first of each group, using matches = matches.groupby(["xr", "yr", "frame"]).agg("first").

You can work with the resulting dataframe directly, or convert it to numpy arrays:

intensity = matches[["mass_red", "mass_green"]].to_numpy()
location = matches[["x_red", "y_red"]].to_numpy()
frame = matches["frame_red"].to_numpy()
particle = matches["particle_red"].to_numpy()