Home > Enterprise >  How to create new column using loop with condition
How to create new column using loop with condition

Time:09-23

this is my DataFrame and I want to create a new column using loop with conditions.

import pandas as pd
student_card = pd.DataFrame({'ID':[20190103, 20190222, 20190531],
                             'name':['Kim', 'Yang', 'Park'],
                             'class':['H', 'W', 'S']})


student_card['new'] = pd.Series() #1.create new column
for i, v in student_card['name'].items(): #2.set index and values
    if "Yang" in v: #3.if there's "Yang" in value
        student_card['new'].append(v) #4. append the value of name column in new coulum

So I tried this method and got stuck with following error: TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid Which is not true btw (type of this column is Series)

CodePudding user response:

What append does is to concatenate a series, which is not the case in your code as v is a string, i is the index of that string. You can try printing print(type(v)) and see for yourself. As for the documentation, you can find it here: https://pandas.pydata.org/docs/reference/api/pandas.Series.append.html

What you are looking for is to set a value to a prexisting index on a column (or Series as its called in pandas). Something like that:

df.loc[index] = value

So in your code, this should do the trick

import pandas as pd
student_card = pd.DataFrame({'ID':[20190103, 20190222, 20190531],
                             'name':['Kim', 'Yang', 'Park'],
                             'class':['H', 'W', 'S']})


student_card['new'] = pd.Series() #1.create new column
for i, v in student_card['name'].items(): #2.set index and values
    if "Yang" in v: #3.if there's "Yang" in value
        student_card['new'].loc[i] = v #4. append the value of name column in new coulum

CodePudding user response:

Append will concatenate two Series. What you want is accesing a row. Use indexing like iloc or iat to do so:

import pandas as pd
student_card = pd.DataFrame({'ID':[20190103, 20190222, 20190531],
                             'name':['Kim', 'Yang', 'Park'],
                             'class':['H', 'W', 'S']})


student_card['new'] = pd.Series() #1.create new column
for i, v in student_card['name'].items(): #2.set index and values
    if "Yang" in v: #3.if there's "Yang" in value
        student_card['new'].iat[i] = v #4. append the value of name column in new coulum

Output:

(Index) ID name class new
0 20190103 Kim H NaN
1 20190222 Yang W Yang
2 20190531 Park S NaN

CodePudding user response:

You should really not use a loop to manipulate a pandas dataframe, this is an anti-pattern.

Also, append is now deprecated.

Use a vectorial approach with boolean indexing:

# select the rows for which name==Yang and add the same name in the new column
student_card.loc[student_card['name'].eq('Yang'), 'new'] = student_card['name']

Or, using where:

# mask all non matching values (name!=Yang) and copy the column
student_card['new'] = student_card['name'].where(student_card['name'].eq('Yang'))

output:

         ID  name class   new
0  20190103   Kim     H   NaN
1  20190222  Yang     W  Yang
2  20190531  Park     S   NaN
  • Related