I want to use the data from three columns as one variable but I don't want to lose the information about which columns the data originally came from.
How it is:
a b c
9 nan nan
nan 8 nan
nan nan 4
4 nan nan
nan nan 3
What it should be:
d e
9 a
8 b
4 c
4 a
3 c
CodePudding user response:
#a melt-less solution that actually fixed my problem well enough
#a tiny dataframe:
df = pd.DataFrame({'A': {0: 9, 1: np.NaN, 2: np.NaN, 3: 4, 4: np.NaN},
'B': {0: np.NaN, 1: 8, 2: np.NaN, 3: np.NaN, 4: np.NaN},
'C': {0: np.NaN, 1: np.NaN, 2: 4, 3: np.NaN, 4: 3}})
print(f"original data: \n {df}")
#create new column with max value from others
df['D'] = df.max(1)
E = []
#go through dataframe
for index, row in df.iterrows():
E.append(df.columns[row.isna() == False][0]) #add the [0] to ignore column D
#add the new beautiful array as a new column
df['E'] = E
print(f"now we have: \n {df}")
CodePudding user response:
Just use melt
:
df.melt().dropna()
Output:
variable value
0 a 9.0
3 a 4.0
6 b 8.0
12 c 4.0
14 c 3.0