Home > Enterprise >  Comparing multiple dataframes to find differing rows
Comparing multiple dataframes to find differing rows

Time:10-10

I have a list, where each element is a Pandas dataframe. Each dataframe consists of the same number of rows and has two columns, Parameter and Value. One element looks like -

Parameter   Value
0   Mode:   GDR-Eth
1   Regtest:    //acds/main/
2   NaN NaN
3   NaN NaN
4   NaN NaN
... ... ...
539 tx_ipg_size_gui 12
540 tx_max_frame_size_gui   1518
541 tx_vlan_detection_gui   1
542 txmac_saddr_gui 73588229205
543 xcvr_type   FGT 

I want to parse the entire list, and display the parameter and each value when they are not the same across all the dataframes. How can I do so?

CodePudding user response:

Given a list of dataframes of the same structure, with some information repeating:

import pandas as pd
import numpy as np

li = [
    pd.DataFrame(
        {
            "Parameter": np.random.choice(["foo", "bar", "baz"], size=100),
            "Value": np.random.randint(0, 20, size=100),
        }
    )
    for n in range(10)
]

We can get the unique information in them all like this:

pd.concat(li).drop_duplicates().dropna()

CodePudding user response:

pd.concat(li).rename_axis(“row”).reset_index().groupby(“row”)[[“Parameter”, “Value”]].value_counts()

That will tell you how many unique parameter/value pairs you have on each row (across all dataframes). If there was only one unique pair (common to all), then the count will be len(li).

  • Related