I have a data frame by 20441 rows and 158 columns. in each row, there are a lot of "NA" values. so I want to convert it to sth like this:
If a value is not NA, I save it's row name , column name and value in another data frame. for example my first data frame is :
row and column name | c1 | c2 | c3 | c4 | c5 |
---|---|---|---|---|---|
r1 | NA | NA | NA | 5 | 6 |
r2 | 1 | 3 | NA | NA | NA |
row name | c1 | c2 |
---|---|---|
r1 | c4 | 5 |
r1 | c5 | 6 |
r2 | c1 | 1 |
r2 | c2 | 3 |
CodePudding user response:
Use melt
:
out = (df.melt('row and column name', var_name='C1', value_name='C2')
.dropna().astype({'C2': int}))
Output:
row and column name | C1 | C2 |
---|---|---|
r2 | c1 | 1 |
r2 | c2 | 3 |
r1 | c4 | 5 |
r1 | c5 | 6 |
CodePudding user response:
The answer using df.melt works but is slower than the below code. I used %%timeit to measure the time of each code.
df.melt takes 2.47 ms ± 93.5 and below code takes 314 µs ± 10.4
if performance is not important then df.melt is better since it is a one line code.
row = []
cols = []
val = []
for col in data.columns[1:]:
for i,e in enumerate(data[col]):
if pd.isna(e) == False:
row.append(data['row&col name'][i])
cols.append(col)
val.append(e)
new_data = pd.DataFrame(list(zip(row,cols,val)),columns=['row','col','val'])