Home > Mobile >  How do I apply melt / wide_to_long onto a data frame with multiple columns?
How do I apply melt / wide_to_long onto a data frame with multiple columns?

Time:12-25

I have a one-hot encoded data frame that is similar to the one below:

review_score Action & Adventure Classic Movies TV Comedies TV Mysteries
4 0 0 1 0
10 1 0 0 0
5 1 0 0 0
7 0 1 0 0
8 0 0 1 0
7 0 0 0 1

it contains 40 columns and I do not think it would be feasible to put every column name into the functions.

How can I pivot or rearrange the table so it looks something like the one below:

Genre review_score
Action & Adventure 10
Action & Adventure 5
Classic Movies 7
TV Comedies 4
TV Comedies 8
TV Mysteries 7

I want each genre to have a row for each review score given in order to plot a boxplot

I have tried melt and wide_to_long functions but cannot get the desired data frame. Please advice!

CodePudding user response:

In pandas 1.5.0 there is pd.from_dummies

import pandas as pd

genre = pd.from_dummies(df.drop(columns="review_score"))
pd.concat([genre[""].rename("Genre"), df["review_score"]], axis=1)

alternatively, use pd.DataFrame.idxmax

genre = df.drop(columns="review_score").idxmax(axis=1)
pd.concat([genre.rename("Genre"), df["review_score"]], axis=1)

CodePudding user response:

If you don't have 1.5.0 version of pandas, you could do:

cols = ['review_score','Genre']
s = df.set_index('review_score').stack()
out = s[s==1].rename_axis(cols).reset_index()[cols]

print(out)

   review_score               Genre
0             4         TV Comedies
1            10  Action & Adventure
2             5  Action & Adventure
3             7      Classic Movies
4             8         TV Comedies
5             7        TV Mysteries

CodePudding user response:

You can use iloc and idxmax(axis=1) to find column_name then use pandas.Series.to_frame() and pandas.DataFrame.assign to add new column to old dataframe.

new_df = df['review_score'].to_frame().assign(Genre = df.iloc[:, 1:].idxmax(axis=1))
print(new_df)

Output:

   review_score               Genre
0             4         TV Comedies
1            10  Action & Adventure
2             5  Action & Adventure
3             7      Classic Movies
4             8         TV Comedies
5             7        TV Mysteries

CodePudding user response:

hello i hope that help you, i work it with for loop

l=[]
k=[]
for i in df.columns[1:]:
    for j in range(len(df)):
        if df[i][j]==1:
            l.append(df.review_score[j])
            k.append(i)
result= pd.DataFrame()
result['Genre']=k
result['review_score']=l

Out put :

              Genre  review_score
0  Action_Adventure            10
1  Action_Adventure             5
2     ClassicMovies             7
3        TVComedies             4
4        TVComedies             8
5       TVMysteries             7
  • Related