In the DataFrame
below
df = pd.DataFrame([('Ve_Paper', 'Buy', '-','Canada',np.NaN),
('Ve_Gasoline', 'Sell', 'Done','Britain',np.NaN),
('Ve_Water', 'Sell','-','Canada,np.NaN),
('Ve_Plant', 'Buy', 'Good','China',np.NaN),
('Ve_Soda', 'Sell', 'Process','Germany',np.NaN)], columns=['Name', 'Action','Status','Country','Value'])
I am trying to update the Value
column based on the condition if Action
is Sell
check if the Status
is not -
if that is true then the first two characters of the Country
needs to be updated as the Value
column else if Status
column is -
the Value
column needs to be updated with the Name
column without the characters Ve_
, if Action
is not Sell
leave the Value
column as np.NaN
What I have tried so far is
import numpy as np
df['Value'] = np.where(df['Action']== 'Sell',df['Country'].str[:2] if df['Status'].str != '-' else df['Name'].str[3:],df['Value'])
but I am getting the output as <pandas.core.strings.StringMethods object at 0x000001EDB8F662B0>
wherever Iam trying to extract substrings
so the output looks like this
Name Action Status Country Value
Ve_Paper Buy - Canada np.NaN
Ve_Gasoline Sell Done Britain <pandas.core.strings.StringMethods object at 662B0>
Ve_Water Sell - Canada <pandas.core.strings.StringMethods object at 0x000001EDB8F662B0>
Ve_Plant Buy Good China np.NaN
Ve_Soda Sell Process Germany <pandas.core.strings.StringMethods object at 0x000001EDB8F662B0>
But the output
I am expecting is
Name Action Status Country Value
Ve_Paper Buy - Canada np.NaN # Because Action is not Sell
Ve_Gasoline Sell Done Britain Br # The first two characters of Country Since Action is sell and Status is not "-"
Ve_Water Sell - Canada Water # The Name value without 'Ve_' since Action is Sell and the Status is '-'
Ve_Plant Buy Good China np.NaN
Ve_Soda Sell Process Germany Ge
As mentioned in the answer below the method I tried works for someone but doesn't work for me. Is there any alternative approaches or better approaches than what I have tried, since that is not working for me
CodePudding user response:
I do not see any problem in here. Could you please elaborate it more if possible?
I am sharing my piece of code below:
import pandas as pd
import numpy as np
df = pd.DataFrame([('Ve_Paper', 'Buy', '-','Canada',np.NaN),
('Ve_Gasoline', 'Sell', 'Done','Britain',np.NaN),
('Ve_Water', 'Sell','-','Canada',np.NaN),
('Ve_Plant', 'Buy', 'Good','China',np.NaN),
('Ve_Soda', 'Sell', 'Process','Germany',np.NaN)], columns=['Name', 'Action','Status','Country','Value'])
df['Value'] = np.where(df['Action']== 'Sell',df['Country'].str[:2] if df['Status'].str != '-' else df['Name'].str[3:],df['Value'])
print(df)
You can also try with the below code:
df['Value'] = np.where(df['Action']== 'Sell', df['Country'].astype(str).str[0:2], df['Value'])
CodePudding user response:
Using np.select was a great choice
conda = df.Action.eq('Sell')
condb = df.Status.eq('-')
df['value'] = np.select([conda&condb, conda&~condb],
[df.Name.str.split('_').str[1],df.Country.str[:2]],
default = np.nan)
df
Out[343]:
Name Action Status Country Value value
0 Ve_Paper Buy - Canada NaN NaN
1 Ve_Gasoline Sell Done Britain NaN Br
2 Ve_Water Sell - Canada NaN Water
3 Ve_Plant Buy Good China NaN NaN
4 Ve_Soda Sell Process Germany NaN Ge