How to iterate over a specific column of multiple dataframes and then compare their values to find t-CodePudding

To elaborate, I have 3 dataframes which have the data of revenue for various cities. I have the data for 3 companies stored separately in those 3 dataframes so I just want to know which company has the highest revenue citywise.

df1 looks like:

{[city: A, B, C, D, E] , [revenue: 10,20,25,30,70]}

df2 looks like:

{[city: A, C, D, E, F] , [revenue: 13, 23, 33, 43, 53]}

df3 looks like:

{[city: A, B, C, D, E] , [revenue: 11,22,32,44,55]}

As we can see, not all companies are present in all the cities. I am thinking of making a nested loop to do this task and iterate over the columns, then if the cities match, compare the revenues and find company with the greatest one(also have to store that somewhere in a new df) but I am unable to think how actually.

CodePudding user response：

IIUC, put all DataFrames into a single one and find the city with the maximum value:

df1 = pd.DataFrame({"city": ["A", "B", "C", "D", "E"] , "revenue": [10,20,25,30,70]})
df2 = pd.DataFrame({"city": ["A", "C", "D", "E", "F"] , "revenue": [13, 23, 33, 43, 53]})
df3 = pd.DataFrame({"city": ["A", "B", "C", "D", "E"] , "revenue": [11,22,32,44,55]})


df_all = pd.concat([df.set_index("city").rename(columns={"revenue": i   1}) 
                    for i, df in enumerate([df1, df2, df3])], axis=1)

df_all["company_max"] = df_all.idxmax(axis=1)

      1     2     3  company_max
A  10.0  13.0  11.0         2
B  20.0   NaN  22.0         3
C  25.0  23.0  32.0         3
D  30.0  33.0  44.0         3
E  70.0  43.0  55.0         1
F   NaN  53.0   NaN         2