Question sounds to have been asked before, but couldn't apply or understand solns for my case, hence asking...
I have a dataframe Main
that looks like this with two columns topic, cat
with new needed column Value
that I want:
topic | cat | Value(needed col)
---------------------------------
top1 | cat1 | 0
---------------------------------
top2 | cat3 | 4
---------------------------------
top4 | cat4 | 1
While the values for Value
col from above are in another dataframe Values_df
with some 80Rx30C in the form:
cat | top1 | top2 | top3 | top4...top30
-------------------------------------------
cat1 | 0 | 1 | 2 | 2
-------------------------------------------
cat2 | 1 | 1 | 1 | 1
-------------------------------------------
cat3 | 2 | 4 | 3 | 3
-------------------------------------------
cat4 | 5 | 2 | 1 | 1
.
.
cat80 | 2 | 7 | 4 |
I tried of transposing the Values_df
and left joining but couldn't figure out as I have multiple columns,
Another attempt I made is creating a dictionary within dictionary that looks like:
{'cat1': {'top1':0, 'top2':1, 'top3':2, 'top4':2}}
and map it but stuck at mapping it.
Any way that will help me get the Value
col added in Main
?
CodePudding user response:
Something like the following?
import pandas as pd
#dummy main data
main_df = pd.DataFrame({
'topic': ['top1', 'top2', 'top4'],
'cat': ['cat1', 'cat3', 'cat4'],
})
#dummy values data
values_df = pd.DataFrame({
'cat': ['cat1', 'cat2', 'cat3', 'cat4', 'cat5', 'cat6'],
'top1': [10, 1, 2, 3, 4, 5],
'top2': [5, 4, 3, 2, 1, 6],
'top3': [2, 2, 1, 3, 5, 3],
'top4': [3, 4, 1, 2, 3, 6],
'top5': [1, 3, 2, 5, 4, 7],
})
melt
the values_df
, but, before that, we need to quickly create columns header for values_df
:
cols_label = ['top' str(x) for x in range(1, 6)]
In your case, range(1, 6)
should be range(1, 31)
.
Melt the values_df
:
values_df = values_df.melt(id_vars=['cat'], value_vars=cols_label)
This will give us the following:
Then, merge main_df and values_df:
merged = main_df.merge(values_df, how='left', left_on=['cat', 'topic'], right_on=['cat', 'variable']).drop('variable', axis=1)
And here is the final result:
CodePudding user response:
You can make a mapping from Values_df
by melting Values_df
by .melt()
and then set index on columns topic
and cat
by .set_index()
and specify the column.
Then, on the Main
dataframe, get a tuple of topic
and cat
for each row as a key for matching the mapping created by .map()
, as follows:
mapping = Values_df.melt(id_vars='cat', var_name='topic').set_index(['topic', 'cat'])['value']
Main['Value'] = Main[['topic', 'cat']].apply(tuple, axis=1).map(mapping)
Data Input
Main = pd.DataFrame({'topic': ['top1', 'top2', 'top4'], 'cat': ['cat1', 'cat3', 'cat4']})
topic cat
0 top1 cat1
1 top2 cat3
2 top4 cat4
data = {'cat': ['cat1', 'cat2', 'cat3', 'cat4', 'cat80'],
'top1': [0, 1, 2, 5, 2],
'top2': [1, 1, 4, 2, 7],
'top3': [2, 1, 3, 1, 4],
'top4': [2, 1, 3, 1, 9]}
Values_df = pd.DataFrame(data)
cat top1 top2 top3 top4
0 cat1 0 1 2 2
1 cat2 1 1 1 1
2 cat3 2 4 3 3
3 cat4 5 2 1 1
4 cat80 2 7 4 9
Result:
print(Main)
topic cat Value
0 top1 cat1 0
1 top2 cat3 4
2 top4 cat4 1