I have a dataframe where I want to create a Dummy variable that takes the value 1 when the Asset Class starts with a D. I want to have all variants that start with a D. How would you do it?
The data looks like
dic = {'Asset Class': ['D.1', 'D.12', 'D.34','nan', 'F.3', 'G.12', 'D.2', 'nan']}
df = pd.DataFrame(dic)
What I want to have is
dic_want = {'Asset Class': ['D.1', 'D.12', 'D.34', 'nan', 'F.3', 'G.12', 'D.2', 'nan'],
'Asset Dummy': [1,1,1,0,0,0,1,0]}
df_want = pd.DataFrame(dic_want)
I tried
df_want["Asset Dummy"] = ((df["Asset Class"] == df.filter(like="D"))).astype(int)
where I get the following error message: ValueError: Columns must be same length as key
I also tried
CSDB["test"] = ((CSDB["PAC2"] == CSDB.str.startswith('D'))).astype(int)
where I get the error message AttributeError: 'DataFrame' object has no attribute 'str'. I tried to transform my object to a string with the standard methos (as.typ(str) and to_string()) but it also does not work. This is probably another problem but I have found only one post with the same question but the post does not have a satisfactory answer.
Any ideas how I can solve my problem?
CodePudding user response:
There are many ways to create a new column based on conditions this is one of them :
import pandas as pd
import numpy as np
dic = {'Asset Class': ['D.1', 'D.12', 'D.34', 'F.3', 'G.12', 'D.2']}
df = pd.DataFrame(dic)
df['Dummy'] = np.where(df['Asset Class'].str.contains("D"), 1, 0)
Here's a link to more : https://www.dataquest.io/blog/tutorial-add-column-pandas-dataframe-based-on-if-else-condition/
CodePudding user response:
You can use Series.str.startswith
on df['Asset Class']
:
>>> dic = {'Asset Class': ['D.1', 'D.12', 'D.34', 'nan', 'F.3', 'G.12', 'D.2', 'nan']}
>>> df = pd.DataFrame(dic)
>>> df['Asset Dummy'] = df['Asset Class'].str.startswith('D').astype(int)
>>> df
Asset Class Asset Dummy
0 D.1 1
1 D.12 1
2 D.34 1
3 nan 0
4 F.3 0
5 G.12 0
6 D.2 1
7 nan 0