Pandas stack() if columns have a specific value-CodePudding

I am trying to stack this table based on ID column but only considering columns [A-D] where the value is 1 and not 0.

Current df:

ID	A	B	C	D
1	1	0	0	1
3	0	1	0	1
7	1	0	1	1
8	1	0	0	0

What I want:

ID	LETTER
1	A
1	D
3	B
3	D
7	A
7	C
7	D
8	A

The following code works but I need a more efficient solution as I have a df with 93434 rows x 12377 columns.

stacked_df = df.set_index('ID').stack().reset_index(name='has_letter').rename(columns={'level_1':'LETTER'})
stacked_df = stacked_df[stacked_df['has_letter']==1].reset_index(drop=True)
stacked_df.drop(['has_letter'], axis=1, inplace=True)

Thank you in advance!

CodePudding user response：

You can mask the non-1 values and stack to remove the NaNs:

df2 = df.rename_axis(columns='LETTERS').set_index('ID')
stacked_df = (df2.where(df2.eq(1)).stack()
                 .reset_index().iloc[:,:2]
              )

Output:

   ID LETTERS
0   1       A
1   1       D
2   3       B
3   3       D
4   7       A
5   7       C
6   7       D
7   8       A

CodePudding user response：

Try:

print(
    df.set_index("ID")
    .apply(lambda x: x.index[x == 1], axis=1)
    .reset_index()
    .explode(0)
    .rename(columns={0: "LETTERS"})
)

Prints:

   ID LETTERS
0   1       A
0   1       D
1   3       B
1   3       D
2   7       A
2   7       C
2   7       D
3   8       A

Or:

x = df.set_index("ID").stack()
print(
    x[x == 1]
    .reset_index()
    .drop(columns=0)
    .rename(columns={"level_1": "LETTER"})
)

Prints:

   ID LETTER
0   1      A
1   1      D
2   3      B
3   3      D
4   7      A
5   7      C
6   7      D
7   8      A

CodePudding user response：

Try this

(df.set_index('ID').dot(df.columns[1:]) # use inner product of column names and values
.apply(list) # separate each letter
.explode() # explode each list
.reset_index(name='LETTER') # reset index for df
)

   ID LETTER
0   1      A
1   1      D
2   3      B
3   3      D
4   7      A
5   7      C
6   7      D
7   8      A