Home > Blockchain >  Python: combine data from multiple columns and add the columname as category
Python: combine data from multiple columns and add the columname as category

Time:06-06

I want to use the data from three columns as one variable but I don't want to lose the information about which columns the data originally came from.

How it is:

a   b   c
9   nan nan
nan 8   nan
nan nan 4
4   nan nan
nan nan 3

What it should be:

d   e
9   a
8   b
4   c
4   a
3   c

CodePudding user response:

#a melt-less solution that actually fixed my problem well enough
#a tiny dataframe:
df = pd.DataFrame({'A': {0: 9,      1: np.NaN, 2: np.NaN, 3: 4,      4: np.NaN},
                   'B': {0: np.NaN, 1: 8,      2: np.NaN, 3: np.NaN, 4: np.NaN},
                   'C': {0: np.NaN, 1: np.NaN, 2: 4,      3: np.NaN, 4: 3}})
print(f"original data: \n {df}")

#create new column with max value from others
df['D'] = df.max(1) 

E = []
#go through dataframe
for index, row in df.iterrows():
  E.append(df.columns[row.isna() == False][0]) #add the [0] to ignore column D

#add the new beautiful array as a new column
df['E'] = E

print(f"now we have: \n {df}")

CodePudding user response:

Just use melt:

df.melt().dropna()

Output:

    variable    value
0   a           9.0
3   a           4.0
6   b           8.0
12  c           4.0
14  c           3.0
  • Related