Home > Software design >  How to select all observations whose name starts with a specific element in python
How to select all observations whose name starts with a specific element in python

Time:07-21

I have a dataframe where I want to create a Dummy variable that takes the value 1 when the Asset Class starts with a D. I want to have all variants that start with a D. How would you do it?

The data looks like

dic = {'Asset Class':  ['D.1', 'D.12', 'D.34','nan', 'F.3', 'G.12', 'D.2', 'nan']}
df = pd.DataFrame(dic)

What I want to have is

dic_want = {'Asset Class':  ['D.1', 'D.12', 'D.34', 'nan', 'F.3', 'G.12', 'D.2', 'nan'],
            'Asset Dummy':  [1,1,1,0,0,0,1,0]}
df_want = pd.DataFrame(dic_want)

I tried

df_want["Asset Dummy"] = ((df["Asset Class"] == df.filter(like="D"))).astype(int)

where I get the following error message: ValueError: Columns must be same length as key

I also tried

CSDB["test"] = ((CSDB["PAC2"] == CSDB.str.startswith('D'))).astype(int)

where I get the error message AttributeError: 'DataFrame' object has no attribute 'str'. I tried to transform my object to a string with the standard methos (as.typ(str) and to_string()) but it also does not work. This is probably another problem but I have found only one post with the same question but the post does not have a satisfactory answer.

Any ideas how I can solve my problem?

CodePudding user response:

There are many ways to create a new column based on conditions this is one of them :

import pandas as pd
import numpy as np

dic = {'Asset Class':  ['D.1', 'D.12', 'D.34', 'F.3', 'G.12', 'D.2']}
df = pd.DataFrame(dic)

df['Dummy'] = np.where(df['Asset Class'].str.contains("D"), 1, 0)

Here's a link to more : https://www.dataquest.io/blog/tutorial-add-column-pandas-dataframe-based-on-if-else-condition/

CodePudding user response:

You can use Series.str.startswith on df['Asset Class']:

>>> dic = {'Asset Class':  ['D.1', 'D.12', 'D.34', 'nan', 'F.3', 'G.12', 'D.2', 'nan']}
>>> df = pd.DataFrame(dic)
>>> df['Asset Dummy'] = df['Asset Class'].str.startswith('D').astype(int)
>>> df
  Asset Class  Asset Dummy
0         D.1            1
1        D.12            1
2        D.34            1
3         nan            0
4         F.3            0
5        G.12            0
6         D.2            1
7         nan            0
  • Related