Home > Back-end >  Combining multiple tables using Index in Python
Combining multiple tables using Index in Python

Time:05-04

I have two data frames :

  1. DF1
MS var1 var2 var3 var4
MS1 ab cd ef gh
MS2 ij ab kl -
MS3 cd gh ef -
  1. df2
Ani ab cd ef gh ij kl
Lion 0.1 NaN 0.5 0.6 1 0.4
Tiger 0.4 0.12 NaN 0.14 NaN 0.3
Dog 2 NaN 0.4 0.8 0.4 0.12

Using both of these the final table to be created is :

Ani MS Var Value
Lion MS1 ab 0.1
Lion MS1 cd 0
Lion MS1 ef 0.5
Lion MS1 gh 0.6
Lion MS2 ij 1
Lion MS2 ab 0.1
Lion MS2 kl 0.4
Lion MS3 cd 0
Lion MS3 gh 0.6
Lion MS3 ef 0.5
Tiger MS1 ab 0.4
Tiger MS1 cd 0
Tiger MS1 ef 0.5
Tiger MS1 gh 0.14
Tiger MS2 ij 0
Tiger MS2 ab 0.4
Tiger MS2 kl 0.3
Tiger MS3 cd 0.12
Tiger MS3 gh 0.14
Tiger MS3 ef 0

I am trying to use a bunch of if-else and for loops but I believe there has to be an efficient way to do this, any help would be highly appreciated.

CodePudding user response:

The logic is unclear, but if I guess correctly you want to melt and merge:

(df2.melt('Ani', var_name='Var').fillna({'value': 0})
    .merge(df1.melt('MS', value_name='Var')
              .drop(columns='variable')
              .query('Var != "-"'),
           on='Var'
          )
    .sort_values(by=['Ani', 'MS'], kind='stable')
)

output:

      Ani Var  value   MS
0     Dog  ab   2.00  MS1
1     Dog  cd   0.00  MS1
2     Dog  ef   0.40  MS1
3     Dog  gh   0.80  MS1
4     Dog  ab   2.00  MS2
5     Dog  ij   0.40  MS2
6     Dog  kl   0.12  MS2
7     Dog  cd   0.00  MS3
8     Dog  ef   0.40  MS3
9     Dog  gh   0.80  MS3
10   Lion  ab   0.10  MS1
11   Lion  cd   0.00  MS1
12   Lion  ef   0.50  MS1
13   Lion  gh   0.60  MS1
14   Lion  ab   0.10  MS2
15   Lion  ij   1.00  MS2
16   Lion  kl   0.40  MS2
17   Lion  cd   0.00  MS3
18   Lion  ef   0.50  MS3
19   Lion  gh   0.60  MS3
20  Tiger  ab   0.40  MS1
21  Tiger  cd   0.12  MS1
22  Tiger  ef   0.00  MS1
23  Tiger  gh   0.14  MS1
24  Tiger  ab   0.40  MS2
25  Tiger  ij   0.00  MS2
26  Tiger  kl   0.30  MS2
27  Tiger  cd   0.12  MS3
28  Tiger  ef   0.00  MS3
29  Tiger  gh   0.14  MS3

CodePudding user response:

IIUC, this is how to achieve your results. Firstly, .melt() both dataframes, as below:

DF1:

DF1_melt = DF1.melt(id_vars='MS', value_name='Variable')[['MS', 'Variable']]

Out:

     MS variable
0   MS1       ab
1   MS2       ij
2   MS3       cd
3   MS1       cd
4   MS2       ab
5   MS3       gh
6   MS1       ef
7   MS2       kl
8   MS3       ef
9   MS1       gh
10  MS2      NaN
11  MS3      NaN

DF2:

DF2_melt = DF2.melt(id_vars='Ani', var_name='Variable')

Out:

      Ani variable  value
0    Lion       ab   0.10
1   Tiger       ab   0.40
2     Dog       ab   2.00
3    Lion       cd    NaN
4   Tiger       cd   0.12
5     Dog       cd    NaN
6    Lion       ef   0.50
7   Tiger       ef    NaN
8     Dog       ef   0.40
9    Lion       gh   0.60
10  Tiger       gh   0.14
11    Dog       gh   0.80
12   Lion       ij   1.00
13  Tiger       ij    NaN
14    Dog       ij   0.40
15   Lion       kl   0.40
16  Tiger       kl   0.30
17    Dog       kl   0.12

Then you can merge the two and fill Nan with 0:

DF_new = DF1_melt.merge(DF2_melt, on='Variable', how='inner').fillna(0)[['Ani', 'MS', 'Variable', 'value']]

Out:

     Ani   MS Variable  value
0    Lion  MS1       ab   0.10
1   Tiger  MS1       ab   0.40
2     Dog  MS1       ab   2.00
3    Lion  MS2       ab   0.10
4   Tiger  MS2       ab   0.40
5     Dog  MS2       ab   2.00
6    Lion  MS2       ij   1.00
7   Tiger  MS2       ij   0.00
8     Dog  MS2       ij   0.40
9    Lion  MS3       cd   0.00
10  Tiger  MS3       cd   0.12
11    Dog  MS3       cd   0.00
12   Lion  MS1       cd   0.00
13  Tiger  MS1       cd   0.12
14    Dog  MS1       cd   0.00
15   Lion  MS3       gh   0.60
16  Tiger  MS3       gh   0.14
17    Dog  MS3       gh   0.80
18   Lion  MS1       gh   0.60
19  Tiger  MS1       gh   0.14
20    Dog  MS1       gh   0.80
21   Lion  MS1       ef   0.50
22  Tiger  MS1       ef   0.00
23    Dog  MS1       ef   0.40
24   Lion  MS3       ef   0.50
25  Tiger  MS3       ef   0.00
26    Dog  MS3       ef   0.40
27   Lion  MS2       kl   0.40
28  Tiger  MS2       kl   0.30
29    Dog  MS2       kl   0.12

You could then sort/rename/re-order columns to match your question. It's not clear why dog isn't in the output, but can edit if it shouldn't be there.

  • Related