Home > database >  how to order dictionary of dataframes by length
how to order dictionary of dataframes by length

Time:10-19

I am trying to merge dataframes in a dictionary

example_dict = {key1: df1, key2: df2, ....}

each of the dataframes are of different row length, and all have a column called id

my plan was to do this:

merged_dfs = partial(pd.merge, on='id', how='inner')                                                              
master_merged_dfs = reduce(merged_dfs, example_dict.values())

The issue is that some dataframes have few ids, and some have many. if I do an inner join the issue is that eventually the final merged df will keep getting smaller.

I think the best way to go about this is probably order all the dataframes in the dictionary, then doing a left or right joing depending on if the ordering is ascending or descending.

I just dont know how I would do the ordering.

CodePudding user response:

You can get the length of a DataFrame through its shape, which you can use as a key to sort on:

sorted_dict = dict(sorted(example_dict.items(), key=lambda x: x[1].shape[0]))

This sorts the dictionary of DataFrames by length ascending. If you prefer by length descending:

sorted_dict = dict(sorted(example_dict.items(), key=lambda x: x[1].shape[0], reverse=True))

CodePudding user response:

You can use the python builtin function sorted and len as key function to get a list sorted in increasing length, and then do a right merge.

merged_dfs = partial(pd.merge, on='id', how='right')                                                              
master_merged_dfs = reduce(merged_dfs, sorted(example_dict.values(), key=len))
  • Related