Home > Mobile >  Select and rearrange DataFrame rows using a Python list and keeping data order
Select and rearrange DataFrame rows using a Python list and keeping data order

Time:09-16

I have a large Pandas DataFrame made of 298k rows. One column is ID and it is unique for each row.
I want to perform two operations:

  1. Filter the dataframe rows such that the ID value for each row is within a Python list, called my_list that contains 13k elements. I tried using query() and it works fine:

    df= df.query(f'ID in {my_list}')

  2. Now, I want to rearrange the dataframe such that the order of the IDs is the same as the one in my_list, which is an ordered list that may contain duplicates.
    If there is a duplicate in the list, I want to add the duplicate row even in the dataframe, keeping the order.
    I tried using categories to order the dataframe but it does not work with duplicates and I have no idea how to add the duplicate row if there is one in the list:

    df['ID'] = pd.Categorical(df['ID'], categories=mylist, ordered=True)

Any efficient solutions? Thanks

Numerical example:
I have a list with these IDs: [1, 2, 3, 1, 4, 2].
And I have a dataframe with these rows:

ID Name ...
0 John ...
1 Paul ...
2 Rick ...
3 Jean ...
4 Mark ...
5 Neil ...
6 Martin ...

I want in output this dataframe, following the list order and adding duplicates in the df:

ID Name ...
1 Paul ...
2 Rick ...
3 Jean ...
1 Paul ...
4 Mark ...
2 Rick ...

CodePudding user response:

If the original index is not critical, you can reindex:

df.set_index('ID').reindex(my_list).reset_index()

CodePudding user response:

This should work:

id_list = [43849175, 13239461, 14461786, 14467946, 14461786]

df_concat = pd.DataFrame()
for id in id_list:
    row = df[df["ID"] == id]
    df_concat = pd.concat([df_concat, row], ignore_index=True)

Maybe a more efficient way to do it:

df_result = pd.DataFrame({"ID": id_list})
df_result = df_result.merge(df, left_on="ID", right_on="ID", how="left")
  • Related