I am trying to manpulate my dataframe, however, I searched for the answer for a while without a solution. If this question duplicated, I sincerely apologize to this.
I have a dataframe(df) like this:
import pandas as pd
data = {'A': ['Emo/3', 'Emo/4', 'Emo/1', '', 'Neu/5', 'Neu/2'],
'Height': [5.1, 6.2, 5.1,'', 5.2, 5.2],
}
df = pd.DataFrame(data)
df
A Height
0 Emo/3 5.1
1 Emo/4 6.2
2 Emo/1 5.1
3
4 Neu/5 5.2
5 Neu/2 5.2
I want add another column "B", so that the value of the "B" is based on column"A". If a row of column A contains a certain string the same row in column B will be a number: eg. if "Emo/" in column A, then column B is 0. If the row in column is empty, then colmn B in the same row is also empty. The output should looks like this:
A Height B
0 Emo/3 5.1 0
1 Emo/4 6.2 0
2 Emo/1 5.1 0
3
4 Neu/5 5.2 1
5 Neu/2 5.2 1
Currently, I have the code below, but it gives me an error message: "TypeError: argument of type 'float' is not iterable"
df["B"]=""
for index, row in df.iterrows:
if "Emo/" in row["A"]:
row["B"]=0
elif "Neu/" in row['A']:
row['B']=1
elif row['A']=="":
row['B']=""
Any suggestions helps! Thanks!
CodePudding user response:
As zlipp commented, parentheses are missing in the method call:
for index, row in df.iterrows():
# ^^
However, note that iterrows
should be avoided whenever possible.
I suggest using np.select
. Build the conditions with str.contains
(or str.startswith
) and set default=''
:
conditions = [
df['A'].str.contains('Emo/'),
df['A'].str.contains('Neu/'),
]
choices = [
0,
1,
]
df['B'] = np.select(conditions, choices, default='')
# A Height B
# 0 Emo/3 5.1 0
# 1 Emo/4 6.2 0
# 2 Emo/1 5.1 0
# 3
# 4 Neu/5 5.2 1
# 5 Neu/2 5.2 1