Combining multiple tables using Index in Python-CodePudding

I have two data frames :

MS	var1	var2	var3	var4
MS1	ab	cd	ef	gh
MS2	ij	ab	kl	-
MS3	cd	gh	ef	-

Ani	ab	cd	ef	gh	ij	kl
Lion	0.1	NaN	0.5	0.6	1	0.4
Tiger	0.4	0.12	NaN	0.14	NaN	0.3
Dog	2	NaN	0.4	0.8	0.4	0.12

Using both of these the final table to be created is :

Ani	MS	Var	Value
Lion	MS1	ab	0.1
Lion	MS1	cd	0
Lion	MS1	ef	0.5
Lion	MS1	gh	0.6
Lion	MS2	ij	1
Lion	MS2	ab	0.1
Lion	MS2	kl	0.4
Lion	MS3	cd	0
Lion	MS3	gh	0.6
Lion	MS3	ef	0.5
Tiger	MS1	ab	0.4
Tiger	MS1	cd	0
Tiger	MS1	ef	0.5
Tiger	MS1	gh	0.14
Tiger	MS2	ij	0
Tiger	MS2	ab	0.4
Tiger	MS2	kl	0.3
Tiger	MS3	cd	0.12
Tiger	MS3	gh	0.14
Tiger	MS3	ef	0

I am trying to use a bunch of if-else and for loops but I believe there has to be an efficient way to do this, any help would be highly appreciated.

CodePudding user response：

The logic is unclear, but if I guess correctly you want to melt and merge:

(df2.melt('Ani', var_name='Var').fillna({'value': 0})
    .merge(df1.melt('MS', value_name='Var')
              .drop(columns='variable')
              .query('Var != "-"'),
           on='Var'
          )
    .sort_values(by=['Ani', 'MS'], kind='stable')
)

output:

      Ani Var  value   MS
0     Dog  ab   2.00  MS1
1     Dog  cd   0.00  MS1
2     Dog  ef   0.40  MS1
3     Dog  gh   0.80  MS1
4     Dog  ab   2.00  MS2
5     Dog  ij   0.40  MS2
6     Dog  kl   0.12  MS2
7     Dog  cd   0.00  MS3
8     Dog  ef   0.40  MS3
9     Dog  gh   0.80  MS3
10   Lion  ab   0.10  MS1
11   Lion  cd   0.00  MS1
12   Lion  ef   0.50  MS1
13   Lion  gh   0.60  MS1
14   Lion  ab   0.10  MS2
15   Lion  ij   1.00  MS2
16   Lion  kl   0.40  MS2
17   Lion  cd   0.00  MS3
18   Lion  ef   0.50  MS3
19   Lion  gh   0.60  MS3
20  Tiger  ab   0.40  MS1
21  Tiger  cd   0.12  MS1
22  Tiger  ef   0.00  MS1
23  Tiger  gh   0.14  MS1
24  Tiger  ab   0.40  MS2
25  Tiger  ij   0.00  MS2
26  Tiger  kl   0.30  MS2
27  Tiger  cd   0.12  MS3
28  Tiger  ef   0.00  MS3
29  Tiger  gh   0.14  MS3

CodePudding user response：

IIUC, this is how to achieve your results. Firstly, .melt() both dataframes, as below:

DF1:

DF1_melt = DF1.melt(id_vars='MS', value_name='Variable')[['MS', 'Variable']]

Out:

     MS variable
0   MS1       ab
1   MS2       ij
2   MS3       cd
3   MS1       cd
4   MS2       ab
5   MS3       gh
6   MS1       ef
7   MS2       kl
8   MS3       ef
9   MS1       gh
10  MS2      NaN
11  MS3      NaN

DF2:

DF2_melt = DF2.melt(id_vars='Ani', var_name='Variable')

Out:

      Ani variable  value
0    Lion       ab   0.10
1   Tiger       ab   0.40
2     Dog       ab   2.00
3    Lion       cd    NaN
4   Tiger       cd   0.12
5     Dog       cd    NaN
6    Lion       ef   0.50
7   Tiger       ef    NaN
8     Dog       ef   0.40
9    Lion       gh   0.60
10  Tiger       gh   0.14
11    Dog       gh   0.80
12   Lion       ij   1.00
13  Tiger       ij    NaN
14    Dog       ij   0.40
15   Lion       kl   0.40
16  Tiger       kl   0.30
17    Dog       kl   0.12

Then you can merge the two and fill Nan with 0:

DF_new = DF1_melt.merge(DF2_melt, on='Variable', how='inner').fillna(0)[['Ani', 'MS', 'Variable', 'value']]

Out:

     Ani   MS Variable  value
0    Lion  MS1       ab   0.10
1   Tiger  MS1       ab   0.40
2     Dog  MS1       ab   2.00
3    Lion  MS2       ab   0.10
4   Tiger  MS2       ab   0.40
5     Dog  MS2       ab   2.00
6    Lion  MS2       ij   1.00
7   Tiger  MS2       ij   0.00
8     Dog  MS2       ij   0.40
9    Lion  MS3       cd   0.00
10  Tiger  MS3       cd   0.12
11    Dog  MS3       cd   0.00
12   Lion  MS1       cd   0.00
13  Tiger  MS1       cd   0.12
14    Dog  MS1       cd   0.00
15   Lion  MS3       gh   0.60
16  Tiger  MS3       gh   0.14
17    Dog  MS3       gh   0.80
18   Lion  MS1       gh   0.60
19  Tiger  MS1       gh   0.14
20    Dog  MS1       gh   0.80
21   Lion  MS1       ef   0.50
22  Tiger  MS1       ef   0.00
23    Dog  MS1       ef   0.40
24   Lion  MS3       ef   0.50
25  Tiger  MS3       ef   0.00
26    Dog  MS3       ef   0.40
27   Lion  MS2       kl   0.40
28  Tiger  MS2       kl   0.30
29    Dog  MS2       kl   0.12

You could then sort/rename/re-order columns to match your question. It's not clear why dog isn't in the output, but can edit if it shouldn't be there.