Is there a method to save not missing values in another data frame?-CodePudding

I have a data frame by 20441 rows and 158 columns. in each row, there are a lot of "NA" values. so I want to convert it to sth like this:

If a value is not NA, I save it's row name , column name and value in another data frame. for example my first data frame is :

row and column name	c1	c2	c3	c4	c5
r1	NA	NA	NA	5	6
r2	1	3	NA	NA	NA

row name	c1	c2
r1	c4	5
r1	c5	6
r2	c1	1
r2	c2	3

CodePudding user response：

Use melt:

out = (df.melt('row and column name', var_name='C1', value_name='C2')
         .dropna().astype({'C2': int}))

Output:

row and column name	C1	C2
r2	c1	1
r2	c2	3
r1	c4	5
r1	c5	6

CodePudding user response：

The answer using df.melt works but is slower than the below code. I used %%timeit to measure the time of each code.

df.melt takes 2.47 ms ± 93.5 and below code takes 314 µs ± 10.4

if performance is not important then df.melt is better since it is a one line code.

row = []
cols = []
val = []
for col in data.columns[1:]: 
    for i,e in enumerate(data[col]): 
        if pd.isna(e) == False:
            row.append(data['row&col name'][i])
            cols.append(col)
            val.append(e)  
new_data = pd.DataFrame(list(zip(row,cols,val)),columns=['row','col','val'])