Updating values of a column from multiple columns if the values are present in those columns-CodePudding

I am trying to update Col1 with values from Col2,Col3... if values are found in any of them. A row would have only one value, but it can have "-" but that should be treated as NaN

df = pd.DataFrame(
    [
        ['A',np.nan,np.nan,np.nan,np.nan,np.nan],
        [np.nan,np.nan,np.nan,'C',np.nan,np.nan], 
        [np.nan,np.nan,"-",np.nan,'B',np.nan],
        [np.nan,np.nan,"-",np.nan,np.nan,np.nan]
    ],
    columns = ['Col1','Col2','Col3','Col4','Col5','Col6']
)

print(df)

   Col1 Col2 Col3   Col4    Col5     Col6
0    A  NaN  NaN     NaN     NaN     NaN
1  NaN  NaN  NaN       C     NaN     NaN
2  NaN  NaN  NaN     NaN       B     NaN 
3  NaN  NaN  NaN     NaN     NaN     NaN

I want the output to be:

   Col1 
0    A  
1    C  
2    B  
3  NaN

I tried to use the update function:

for col in df.columns[1:]:
    df[Col1].update(col)

It works on this small DataFrame but when I run it on a larger DataFrame with a lot more rows and columns, I am losing a lot of values in between. Is there any better function to do this preferably without a loop. Please help I tried with many other methods, including using .loc but no joy.

CodePudding user response：

Here is one way to go about it


# convert the values in the row to series, and sort, NaN moves to the end
df2=df.apply(lambda x: pd.Series(x).sort_values(ignore_index=True), axis=1)

# rename df2 column as df columns
df2.columns=df.columns

# drop where all values in the column as null
df2.dropna(axis=1, how='all', inplace=True) 

print(df2)

Col1
0   A
1   C
2   B
3   NaN

CodePudding user response：

You can use combine_first:

from functools import reduce

reduce(
    lambda x, y: x.combine_first(df[y]),
    df.columns[1:],
    df[df.columns[0]]
).to_frame()

The following DataFrame is the result of the previous code:

  Col1
0    A
1    C
2    B
3  NaN