Home > front end >  Get result of merge in loop for
Get result of merge in loop for

Time:10-05

I have a huge 800k row dataframe which I need to find the key with another dataframe.

Initially I was looping through my 2 dataframes with a loop and checking the value of the keys with a condition.

I was told about the possibility of using merge to save time. However, no way to make it work :(

Overall, here's the code I'm trying to adapt:

mergeTwo = pd.read_json('merge/mergeUpdate.json')
matches = pd.read_csv('archive/matches.csv')

for indexOne,value in tqdm(mergeTwo.iterrows()):
    for index, match in matches.iterrows():
        if value["gameid"] == match["gameid"]:
         print(match)

for index, value in mergeTwo.iterrows():
    test = value.to_frame().merge(matches, on='gameid')
    print(test)

In my first case, my code works without worries. In the second, this one tells me a problem of not known key (gameid)

Anyone got a solution?

Thanks in advance !

CodePudding user response:

When you iterate over rows, your value is a Series which is transformed into a one-column frame by to_frame method with the original column names as its index. So you need to transpose it to make the second way work:

for index, value in mergeTwo.iterrows():
    # note .T after .to_frame
    test = value.to_frame().T.merge(matches, on='gameid')
    print(test)

But iteration is a redundant tool, merge applied to the first frame should be enough:

mergeTwo.merge(matches, on='gameid', how='left')
  • Related