I have something like this:
df =
col1 col2 col3
0 B C A
1 E D G
2 NaN F B
EDIT : I need to convert it into this:
result =
Name location
0 B col1,col2
1 C col1
2 A col1
3 E col2
4 D col2
5 G col2
6 F col3
Essentially getting a "location" telling me which column an "Name" is in. Thank you in advance.
CodePudding user response:
Try melt
and dropna
:
>>> df.melt(var_name='location').dropna()
location value
0 col1 B
1 col1 E
3 col2 C
4 col2 D
5 col2 F
6 col3 A
7 col3 G
>>>
CodePudding user response:
Try using melt
to convert columns
to rows
. And give the rows
a column
name.
Then dropna
to remove the NaN
values in rows
.
df = df.melt(var_name="location", value_name="Name").dropna()
CodePudding user response:
You can use pandas.melt
and pandas.groupby.agg
:
df = df.melt(var_name="location", value_name="Name").dropna()
new_df = df.groupby("Name", as_index=False).agg(",".join)
print(new_df)
Output:
Name location
0 A col3
1 B col1,col3
2 C col2
3 D col2
4 E col1
5 F col2
6 G col3
CodePudding user response:
Or an alternative with stack()
:
new = df.stack().reset_index().drop('level_0',axis=1).dropna()
new.columns = ['name','location']
prints:
name location
0 col1 B
1 col2 C
2 col3 A
3 col1 E
4 col2 D
5 col3 G
6 col2 F
EDIT:
To get your updated output you could use a groupby
along with join()
:
new.groupby('location').agg({'name':lambda x: ', '.join(list(x))}).reset_index()
Which gives you:
location name
0 A col3
1 B col1, col3
2 C col2
3 D col2
4 E col1
5 F col2
6 G col3