I am trying to merge dataframes in a dictionary
example_dict = {key1: df1, key2: df2, ....}
each of the dataframes are of different row length, and all have a column called id
my plan was to do this:
merged_dfs = partial(pd.merge, on='id', how='inner')
master_merged_dfs = reduce(merged_dfs, example_dict.values())
The issue is that some dataframes have few ids, and some have many. if I do an inner join the issue is that eventually the final merged df will keep getting smaller.
I think the best way to go about this is probably order all the dataframes in the dictionary, then doing a left or right joing depending on if the ordering is ascending or descending.
I just dont know how I would do the ordering.
CodePudding user response:
You can get the length of a DataFrame
through its shape, which you can use as a key to sort on:
sorted_dict = dict(sorted(example_dict.items(), key=lambda x: x[1].shape[0]))
This sorts the dictionary of DataFrame
s by length ascending. If you prefer by length descending:
sorted_dict = dict(sorted(example_dict.items(), key=lambda x: x[1].shape[0], reverse=True))
CodePudding user response:
You can use the python builtin function sorted
and len
as key function to get a list sorted in increasing length, and then do a right merge.
merged_dfs = partial(pd.merge, on='id', how='right')
master_merged_dfs = reduce(merged_dfs, sorted(example_dict.values(), key=len))