Home > Net >  Mapping values from one dataframe to a target dataframe
Mapping values from one dataframe to a target dataframe

Time:10-02

Question sounds to have been asked before, but couldn't apply or understand solns for my case, hence asking...
I have a dataframe Main that looks like this with two columns topic, cat with new needed column Value that I want:

topic    |   cat   | Value(needed col)
---------------------------------
top1     | cat1    |   0
---------------------------------
top2     | cat3    |   4
---------------------------------
top4     | cat4    |   1

While the values for Value col from above are in another dataframe Values_df with some 80Rx30C in the form:

cat     | top1 | top2 | top3 | top4...top30
-------------------------------------------
cat1    | 0    |  1   | 2    | 2
-------------------------------------------
cat2    | 1    |  1   | 1    | 1
-------------------------------------------
cat3    | 2    |  4   | 3    | 3
-------------------------------------------
cat4    | 5    |  2   | 1    | 1
.
.
cat80   | 2    |  7   | 4    |

I tried of transposing the Values_df and left joining but couldn't figure out as I have multiple columns, Another attempt I made is creating a dictionary within dictionary that looks like: {'cat1': {'top1':0, 'top2':1, 'top3':2, 'top4':2}} and map it but stuck at mapping it.

Any way that will help me get the Value col added in Main?

CodePudding user response:

Something like the following?

import pandas as pd

#dummy main data
main_df = pd.DataFrame({
  'topic': ['top1', 'top2', 'top4'],
  'cat': ['cat1', 'cat3', 'cat4'],
})

#dummy values data
values_df = pd.DataFrame({
  'cat': ['cat1', 'cat2', 'cat3', 'cat4', 'cat5', 'cat6'],
  'top1': [10, 1, 2, 3, 4, 5],
  'top2': [5, 4, 3, 2, 1, 6],
  'top3': [2, 2, 1, 3, 5, 3],
  'top4': [3, 4, 1, 2, 3, 6],
  'top5': [1, 3, 2, 5, 4, 7],
})

melt the values_df, but, before that, we need to quickly create columns header for values_df:

cols_label = ['top'   str(x) for x in range(1, 6)]

In your case, range(1, 6) should be range(1, 31).

Melt the values_df:

values_df = values_df.melt(id_vars=['cat'], value_vars=cols_label)

This will give us the following:

enter image description here

Then, merge main_df and values_df:

merged = main_df.merge(values_df, how='left', left_on=['cat', 'topic'], right_on=['cat', 'variable']).drop('variable', axis=1)

And here is the final result:

enter image description here

CodePudding user response:

You can make a mapping from Values_df by melting Values_df by .melt() and then set index on columns topic and cat by .set_index() and specify the column.

Then, on the Main dataframe, get a tuple of topic and cat for each row as a key for matching the mapping created by .map(), as follows:

mapping = Values_df.melt(id_vars='cat', var_name='topic').set_index(['topic', 'cat'])['value']

Main['Value'] = Main[['topic', 'cat']].apply(tuple, axis=1).map(mapping)

Data Input

Main = pd.DataFrame({'topic': ['top1', 'top2', 'top4'], 'cat': ['cat1', 'cat3', 'cat4']})

  topic   cat
0  top1  cat1
1  top2  cat3
2  top4  cat4


data = {'cat': ['cat1', 'cat2', 'cat3', 'cat4', 'cat80'],
 'top1': [0, 1, 2, 5, 2],
 'top2': [1, 1, 4, 2, 7],
 'top3': [2, 1, 3, 1, 4],
 'top4': [2, 1, 3, 1, 9]}
Values_df = pd.DataFrame(data) 

     cat  top1  top2  top3  top4
0   cat1     0     1     2     2
1   cat2     1     1     1     1
2   cat3     2     4     3     3
3   cat4     5     2     1     1
4  cat80     2     7     4     9

Result:

print(Main)

  topic   cat  Value
0  top1  cat1      0
1  top2  cat3      4
2  top4  cat4      1
  • Related