Mapping values from one dataframe to a target dataframe-CodePudding

Question sounds to have been asked before, but couldn't apply or understand solns for my case, hence asking...
I have a dataframe Main that looks like this with two columns topic, cat with new needed column Value that I want:

topic    |   cat   | Value(needed col)
---------------------------------
top1     | cat1    |   0
---------------------------------
top2     | cat3    |   4
---------------------------------
top4     | cat4    |   1

While the values for Value col from above are in another dataframe Values_df with some 80Rx30C in the form:

cat     | top1 | top2 | top3 | top4...top30
-------------------------------------------
cat1    | 0    |  1   | 2    | 2
-------------------------------------------
cat2    | 1    |  1   | 1    | 1
-------------------------------------------
cat3    | 2    |  4   | 3    | 3
-------------------------------------------
cat4    | 5    |  2   | 1    | 1
.
.
cat80   | 2    |  7   | 4    |

I tried of transposing the Values_df and left joining but couldn't figure out as I have multiple columns, Another attempt I made is creating a dictionary within dictionary that looks like: {'cat1': {'top1':0, 'top2':1, 'top3':2, 'top4':2}} and map it but stuck at mapping it.

Any way that will help me get the Value col added in Main?

CodePudding user response：

Something like the following?

import pandas as pd

#dummy main data
main_df = pd.DataFrame({
  'topic': ['top1', 'top2', 'top4'],
  'cat': ['cat1', 'cat3', 'cat4'],
})

#dummy values data
values_df = pd.DataFrame({
  'cat': ['cat1', 'cat2', 'cat3', 'cat4', 'cat5', 'cat6'],
  'top1': [10, 1, 2, 3, 4, 5],
  'top2': [5, 4, 3, 2, 1, 6],
  'top3': [2, 2, 1, 3, 5, 3],
  'top4': [3, 4, 1, 2, 3, 6],
  'top5': [1, 3, 2, 5, 4, 7],
})

melt the values_df, but, before that, we need to quickly create columns header for values_df:

cols_label = ['top'   str(x) for x in range(1, 6)]

In your case, range(1, 6) should be range(1, 31).

Melt the values_df:

values_df = values_df.melt(id_vars=['cat'], value_vars=cols_label)

This will give us the following:

Then, merge main_df and values_df:

merged = main_df.merge(values_df, how='left', left_on=['cat', 'topic'], right_on=['cat', 'variable']).drop('variable', axis=1)

And here is the final result:

CodePudding user response：

You can make a mapping from Values_df by melting Values_df by .melt() and then set index on columns topic and cat by .set_index() and specify the column.

Then, on the Main dataframe, get a tuple of topic and cat for each row as a key for matching the mapping created by .map(), as follows:

mapping = Values_df.melt(id_vars='cat', var_name='topic').set_index(['topic', 'cat'])['value']

Main['Value'] = Main[['topic', 'cat']].apply(tuple, axis=1).map(mapping)

Data Input

Main = pd.DataFrame({'topic': ['top1', 'top2', 'top4'], 'cat': ['cat1', 'cat3', 'cat4']})

  topic   cat
0  top1  cat1
1  top2  cat3
2  top4  cat4


data = {'cat': ['cat1', 'cat2', 'cat3', 'cat4', 'cat80'],
 'top1': [0, 1, 2, 5, 2],
 'top2': [1, 1, 4, 2, 7],
 'top3': [2, 1, 3, 1, 4],
 'top4': [2, 1, 3, 1, 9]}
Values_df = pd.DataFrame(data) 

     cat  top1  top2  top3  top4
0   cat1     0     1     2     2
1   cat2     1     1     1     1
2   cat3     2     4     3     3
3   cat4     5     2     1     1
4  cat80     2     7     4     9

Result:

print(Main)

  topic   cat  Value
0  top1  cat1      0
1  top2  cat3      4
2  top4  cat4      1