I have two data frames: first:
l1=[123,345,546,245]
l2=["a","a","b","b"]
l3=["j","k","l","pp"]
l4=["m","n","m","n"]
df1=pd.DataFrame(list(zip(l1,l2,l3,l4)),columns=['id','X','Y','Z'])
df1.head()
second:
l1=["X","X","Y","Y","Y","Y","Z","Z"]
l2=["a","b","j","k","l","pp","m","n"]
l3=["1","2","1","2","3","4","1","2"]
df2=pd.DataFrame(list(zip(l1,l2,l3)),columns=["labelnames","levels","labels"])
df2
I need to do label encoding to the first data frame values by using the second data frame. I tried converting the second data frame to a dictionary and then replacing the first data frame values.
such as;
dict= {"X" : { a:1 , b:2}
"Y": {j:1,k:2,l:3, pp:4},
"Z" : {m:1,n:2}}
I couldn't print the proper dictionary format.
I want to see the output like:
l1=[123,345,546,245]
l2=["1","1","2","2"]
l3=["1","2","3","4"]
l4=["1","2","1","2"]
df1=pd.DataFrame(list(zip(l1,l2,l3,l4)),columns=['id','X','Y','Z'])
df1.head()
So, How can I replace the first data frame values( X, Y, Z) with second data frame labels?
CodePudding user response:
here is one way to do it, using melt, and pivot
# melt the dataframe to make it columner
# merge with the reference dataframe
# finally pivot to get back to the original format
(df.melt(id_vars=['id'], var_name='key' ).merge(df2,
left_on=['key','value'],
right_on=['labelnames','levels'],
how='left'
)
.pivot(index='id', columns='key', values='labels')
.reset_index()
.rename_axis(columns=None)
)
id X Y Z
0 123 1 1 1
1 245 2 4 2
2 345 1 2 2
3 546 2 3 1