I have a daframe like the one below:
the columns str, mpr, cta, mpt
are all the same but for every record I have value in only 1 of these 4 columns.
How can I merge these 4 columns in only 1 like the output below:
id str mpr cta mpt
1 10 null null null
2 null 11 null null
3 null null 6 null
4 null null null 1
Output:
id final
1 10
2 11
3 6
4 1
as you can see the columns final
will always have a value coming from only one of each 4 columns.
Important: sometimes the 4 columns is null then the final
column will also be null
CodePudding user response:
# find the max along the row, starting from second columns (skipping over ID)
df['final']=df.iloc[:,1:].max(axis=1, numeric_only=True) #.astype(int)
df[['id','final']]
# if a row can have all Null than don't add astype(int)
id final
0 1 10
1 2 11
2 3 6
3 4 1
CodePudding user response:
Lets call your original DataFrame df and lets say you plan on scaling it on a million columns.
Create a list for the final column:
Final_List = []
Create a list of colums
my_list = df.columns.values.tolist()
for x in range(len(my_list)):
for y in range(len(df[my_list[x]]):
if df[my_list[x][y] != 'null'
final_value = df[my_list[x][y]]
Final_List.append(final_value)
break #breaks the inner loop
else:
final_value = 'nul'
Final_List.append(final_value)
df["final"] = Final_List