I have two data frames :
- DF1
MS | var1 | var2 | var3 | var4 |
---|---|---|---|---|
MS1 | ab | cd | ef | gh |
MS2 | ij | ab | kl | - |
MS3 | cd | gh | ef | - |
- df2
Ani | ab | cd | ef | gh | ij | kl |
---|---|---|---|---|---|---|
Lion | 0.1 | NaN | 0.5 | 0.6 | 1 | 0.4 |
Tiger | 0.4 | 0.12 | NaN | 0.14 | NaN | 0.3 |
Dog | 2 | NaN | 0.4 | 0.8 | 0.4 | 0.12 |
Using both of these the final table to be created is :
Ani | MS | Var | Value |
---|---|---|---|
Lion | MS1 | ab | 0.1 |
Lion | MS1 | cd | 0 |
Lion | MS1 | ef | 0.5 |
Lion | MS1 | gh | 0.6 |
Lion | MS2 | ij | 1 |
Lion | MS2 | ab | 0.1 |
Lion | MS2 | kl | 0.4 |
Lion | MS3 | cd | 0 |
Lion | MS3 | gh | 0.6 |
Lion | MS3 | ef | 0.5 |
Tiger | MS1 | ab | 0.4 |
Tiger | MS1 | cd | 0 |
Tiger | MS1 | ef | 0.5 |
Tiger | MS1 | gh | 0.14 |
Tiger | MS2 | ij | 0 |
Tiger | MS2 | ab | 0.4 |
Tiger | MS2 | kl | 0.3 |
Tiger | MS3 | cd | 0.12 |
Tiger | MS3 | gh | 0.14 |
Tiger | MS3 | ef | 0 |
I am trying to use a bunch of if-else and for loops but I believe there has to be an efficient way to do this, any help would be highly appreciated.
CodePudding user response:
The logic is unclear, but if I guess correctly you want to melt
and merge
:
(df2.melt('Ani', var_name='Var').fillna({'value': 0})
.merge(df1.melt('MS', value_name='Var')
.drop(columns='variable')
.query('Var != "-"'),
on='Var'
)
.sort_values(by=['Ani', 'MS'], kind='stable')
)
output:
Ani Var value MS
0 Dog ab 2.00 MS1
1 Dog cd 0.00 MS1
2 Dog ef 0.40 MS1
3 Dog gh 0.80 MS1
4 Dog ab 2.00 MS2
5 Dog ij 0.40 MS2
6 Dog kl 0.12 MS2
7 Dog cd 0.00 MS3
8 Dog ef 0.40 MS3
9 Dog gh 0.80 MS3
10 Lion ab 0.10 MS1
11 Lion cd 0.00 MS1
12 Lion ef 0.50 MS1
13 Lion gh 0.60 MS1
14 Lion ab 0.10 MS2
15 Lion ij 1.00 MS2
16 Lion kl 0.40 MS2
17 Lion cd 0.00 MS3
18 Lion ef 0.50 MS3
19 Lion gh 0.60 MS3
20 Tiger ab 0.40 MS1
21 Tiger cd 0.12 MS1
22 Tiger ef 0.00 MS1
23 Tiger gh 0.14 MS1
24 Tiger ab 0.40 MS2
25 Tiger ij 0.00 MS2
26 Tiger kl 0.30 MS2
27 Tiger cd 0.12 MS3
28 Tiger ef 0.00 MS3
29 Tiger gh 0.14 MS3
CodePudding user response:
IIUC, this is how to achieve your results. Firstly, .melt()
both dataframes, as below:
DF1:
DF1_melt = DF1.melt(id_vars='MS', value_name='Variable')[['MS', 'Variable']]
Out:
MS variable
0 MS1 ab
1 MS2 ij
2 MS3 cd
3 MS1 cd
4 MS2 ab
5 MS3 gh
6 MS1 ef
7 MS2 kl
8 MS3 ef
9 MS1 gh
10 MS2 NaN
11 MS3 NaN
DF2:
DF2_melt = DF2.melt(id_vars='Ani', var_name='Variable')
Out:
Ani variable value
0 Lion ab 0.10
1 Tiger ab 0.40
2 Dog ab 2.00
3 Lion cd NaN
4 Tiger cd 0.12
5 Dog cd NaN
6 Lion ef 0.50
7 Tiger ef NaN
8 Dog ef 0.40
9 Lion gh 0.60
10 Tiger gh 0.14
11 Dog gh 0.80
12 Lion ij 1.00
13 Tiger ij NaN
14 Dog ij 0.40
15 Lion kl 0.40
16 Tiger kl 0.30
17 Dog kl 0.12
Then you can merge the two and fill Nan with 0:
DF_new = DF1_melt.merge(DF2_melt, on='Variable', how='inner').fillna(0)[['Ani', 'MS', 'Variable', 'value']]
Out:
Ani MS Variable value
0 Lion MS1 ab 0.10
1 Tiger MS1 ab 0.40
2 Dog MS1 ab 2.00
3 Lion MS2 ab 0.10
4 Tiger MS2 ab 0.40
5 Dog MS2 ab 2.00
6 Lion MS2 ij 1.00
7 Tiger MS2 ij 0.00
8 Dog MS2 ij 0.40
9 Lion MS3 cd 0.00
10 Tiger MS3 cd 0.12
11 Dog MS3 cd 0.00
12 Lion MS1 cd 0.00
13 Tiger MS1 cd 0.12
14 Dog MS1 cd 0.00
15 Lion MS3 gh 0.60
16 Tiger MS3 gh 0.14
17 Dog MS3 gh 0.80
18 Lion MS1 gh 0.60
19 Tiger MS1 gh 0.14
20 Dog MS1 gh 0.80
21 Lion MS1 ef 0.50
22 Tiger MS1 ef 0.00
23 Dog MS1 ef 0.40
24 Lion MS3 ef 0.50
25 Tiger MS3 ef 0.00
26 Dog MS3 ef 0.40
27 Lion MS2 kl 0.40
28 Tiger MS2 kl 0.30
29 Dog MS2 kl 0.12
You could then sort/rename/re-order columns to match your question. It's not clear why dog isn't in the output, but can edit if it shouldn't be there.