assign one column value to another column based on condition in pandas-CodePudding

I want to how we can assign one column value to another column if it has null or 0 value I have a dataframe like this:

id      column1   column2

5263    5400        5400
4354    6567        Null
5656    5456        5456  
5565    6768        3489
4500    3490        Null

The Expected Output is

id      column1   column2

5263    5400        5400
4354    6567        6567        
5656    5456        5456  
5565    6768        3489
4500    3490        3490

that is, if df['column2'] = Null/0 then it has take df['column1'] value.

Can someone explain, how can I achieve my desired output?

CodePudding user response：

Here's my suggestion. Not sure whether it is the fastest, but it should work here ;)


#we start by creating an empty list 
column2 = []

#for each row in the dataframe 
for i in df.index:
    # if the value col2 is null or 0, then it takes the value of col1
    if df.loc[i, 'column2'] in ['null', 0]:
        column2.append(df.loc[i, 'column1'])
    #else it takes the value of column 2
    else: 
        column2.append(df.loc[i, 'column2'])

#we replace the current column 2 by the new one !
df['column2'] = column2```

CodePudding user response：

Update using only Native Pandas Functionality

#Creates boolean array conditionCheck, checking conditions for each row in df
#Where() will only update when conditionCheck == False, so inverted boolean values using "~"

conditionCheck = ~((df['column2'].isna()) | (df['column2']==0))
df["column2"].where(conditionCheck,df["column1"],inplace=True)

print(df)

Code to Generate Sample DataFrame

Changed row 3 of column2 to 0 to test all scenarios

import numpy as np
import pandas as pd

data = [
        [5263,5400,5400]
        ,[4354,6567,None]
        ,[5656,5456,0]  
        ,[5565,6768,3489]
        ,[4500,3490,None]
        ]
df = pd.DataFrame(data,columns=["id","column1","column2"],dtype=pd.Int64Dtype())

CodePudding user response：

Similar question was already solved here.

"Null" keyword does not exist in python. Empty cells in pandas have np.nan type. So assuming you mean np.nans, one good way to achieve your desired output would be:

Create a boolean mask to select rows with np.nan or 0 value and then copy when mask is True.

mask = (df['column2'].isna()) | (df['column2']==0)
df.loc[mask, "column2"] = df.loc[mask, "column1"]

CodePudding user response：

Based on the answers to this similar question, you can do the following:

Using np.where:

df['column2'] = np.where((df['column2'] == 'Null') | (df['column2'] == 0), df['column1'], df['column2'])

Instead, using only pandas and Python:

df['column2'][(df['column2'] == 0) | (df['column2'] == 'Null')] = df['column1']