This is my dataset:
Dept | Cell culture | Bioinfo | Immunology | Trigonometry | Algebra | Microbio | Optics |
---|---|---|---|---|---|---|---|
Biotech | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
Biotech | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
Math | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
Biotech | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
Physics | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
This is how I want my result:
|Biotech|Cell culture|Bioinfo |Immunology|
|Biotech|Cell culture|Immunology| |
|Math |Trigonometry|Algebra | |
|Biotech|Microbio | | |
|Physics|Optics | | |
CodePudding user response:
Convert column Dept
to index and for values with 1
convert columns names to lists and then to new DataFrame
:
df2 = df.set_index('Dept')
cols = df2.columns.to_numpy()
df2 = pd.DataFrame([cols[m] for m in df2.eq(1).to_numpy()], index=df2.index)
print (df2)
0 1 2
Dept
Biotech Cell culture Bioinfo Immunology
Biotech Cell culture Immunology None
Math Trigonometry Algebra None
Biotech Microbio None None
Physics Optics None None
If performance not important or small data:
df2 = df.set_index('Dept')
cols = df2.columns.to_numpy()
df2 = pd.DataFrame(df2.eq(1).apply(lambda x: cols[x], axis=1).tolist(), index=df2.index)
print (df2)
0 1 2
Dept
Biotech Cell culture Bioinfo Immunology
Biotech Cell culture Immunology None
Math Trigonometry Algebra None
Biotech Microbio None None
Physics Optics None None
Last if need all columns:
df2 = df2.reset_index()
df2.columns = range(len(df2.columns))
print (df2)
0 1 2 3
0 Biotech Cell culture Bioinfo Immunology
1 Biotech Cell culture Immunology None
2 Math Trigonometry Algebra None
3 Biotech Microbio None None
4 Physics Optics None None