Home > Enterprise >  How to create a new column based on conditions on other column in pandas
How to create a new column based on conditions on other column in pandas

Time:11-13

I want to take first five digits from column C if there is word "US" in column B else take all the digits from column C.Required outputs are in column "OUTPUT".

A          B          C         OUTPUT
hell       US         12234455  12234
mell       UK         12345666  12345666
shall      US         21248075  21248
pel      SPAIN        90056784  90056784
wel        SP         35455689  35455689
shel       US         12095678  12095

I am using the following codes but its not working.Please help.Dataset name=sf1

if sf1.B.all=="US":
   sf1['OUTPUT'] = sf1['C'].astype(str).str[:5]
else:
   sf1['OUTPUT'] = sf1['C'].astype(str) 

CodePudding user response:

You can use numpy where:

import numpy as np

sf1['OUTPUT'] = np.where(sf1['B'].eq('US'), sf1['C'].astype(str).str[:5], sf1['C'])

or pandas where:

sf1['OUTPUT'] = sf1['C'].astype(str).str[:5].where(sf1['B'].eq('US'), sf1['C'])

or pandas loc:

sf1.loc[sf1['B'].eq('US'), 'OUTPUT'] = sf1['C'].astype(str).str[:5]
sf1.loc[sf1['B'].ne('US'), 'OUTPUT'] = sf1['C']

Output:

        A       B          C      OUTPUT
0    hell      US   12234455       12234
1    mell      UK   12345666    12345666
2   shall      US   21248075       21248
3     pel   SPAIN   90056784    90056784
4     wel      SP   35455689    35455689
5    shel      US   12095678       12095

CodePudding user response:

To accomplish this, we’ll use numpy’s built-in where() function. This function takes three arguments in sequence: the condition we’re testing for, the value to assign to our new column if that condition is true, and the value to assign if it is false. source

stf1['OUTPUT'] = np.where(df['B'] == 'US', df['C'].astype(str).str[:5], df['C'])

CodePudding user response:

Let us try mask

df['OUTPUT'] = df.C.mask(df.B.eq('US'),lambda x : x.astype(str).str[:5])
df
Out[30]: 
       A      B         C    OUTPUT
0   hell     US  12234455     12234
1   mell     UK  12345666  12345666
2  shall     US  21248075     21248
3    pel  SPAIN  90056784  90056784
4    wel     SP  35455689  35455689
5   shel     US  12095678     12095

CodePudding user response:

You can also employ the apply function as follows:

df['Output'] = df.apply(lambda row: row.C[:5] if row.B == 'US' else row.C, axis = 1)  

this produces:

    A   B   C   Output
0   hell    US  12234455    12234
1   mell    UK  12345666    12345666
2   shall   US  21248075    21248
3   pel     SPAIN   90056784    90056784
4   wel     SP  35455689    35455689
5   shel    US  12095678    12095
  • Related