Home > Software design >  Creating new column based on existing column in pandas
Creating new column based on existing column in pandas

Time:04-29

I have a data-frame given below like this:

ID1    ID2  Yr-Month  Class                   
1      p1   Feb-19   PE5->SC
1      p2   Feb-19   SC
1      p3   Feb-19   EA->PE5
1      p4   Feb-19   EA->PE5->SC
1      p5   Feb-19   PC->PE5->SC

I need to create a new column called Final in existing data where if transition is happening from PE5 to SC i.e. PE5->SC in following ways PE5->SC,EA->PE5->SC,PC->PE5->SC Final column value needs to be one else 0 given below is the expected output:

ID1    ID2  Yr-Month  Class         Final   
1      p1   Feb-19   PE5->SC         1
1      p2   Feb-19   SC              0
1      p3   Feb-19   EA->PE5         0
1      p4   Feb-19   EA->PE5->SC     1
1      p5   Feb-19   PC->PE5->SC     1

please help

CodePudding user response:

Test substrings by Series.str.contains and for convert True/False to 1/0 convert values to integers:

df['Final'] = df['Class'].str.contains('PE5.*SC').astype(int)

Alternative with numpy.where:

df['Final'] = np.where(df['Class'].str.contains('PE5.*SC'), 1, 0)
print (df)
   ID1 ID2 Yr-Month        Class  Final
0    1  p1   Feb-19      PE5->SC      1
1    1  p2   Feb-19           SC      0
2    1  p3   Feb-19      EA->PE5      0
3    1  p4   Feb-19  EA->PE5->SC      1
4    1  p5   Feb-19  PC->PE5->SC      1

CodePudding user response:

You can create a 'Final' column of 0s, then use .loc to find the rows where Class contains 'PE5->' using df['Class'].str.contains('PE5->'), and then set the corresponding values in 'Final' to 1

df['Final'] = 0
df.loc[df['Class'].str.contains('PE5->'), 'Final'] = 1

Output:

ID1    ID2  Yr-Month  Class         Final   
1      p1   Feb-19   PE5->SC         1
1      p2   Feb-19   SC              0
1      p3   Feb-19   EA->PE5         0
1      p4   Feb-19   EA->PE5->SC     1
1      p5   Feb-19   PC->PE5->SC     1

EDIT: Having seen OP's comment on Jezrael's answer, it appears there was a missing case from the original question. The code to cover the updated case would be:

df['Final'] = 0
df.loc[df['Class'].str.contains('PE5->.*SC', regex=True), 'Final'] = 1

Output:

ID1    ID2  Yr-Month  Class         Final   
1      p1   Feb-19   PE5->SC         1
1      p2   Feb-19   SC              0
1      p3   Feb-19   EA->PE5         0
1      p4   Feb-19   EA->PE5->SC     1
1      p5   Feb-19   PE5->PC->SC     1

CodePudding user response:

Another interesting way would be to use replace

df['Final'] = df['Class'].replace(['PE5.*SC', '.'], [1, 0], regex=True)

Output:

ID1 ID2 Yr-Month Class Final
1 p1 Feb-19 PE5->SC 1
1 p2 Feb-19 SC 0
1 p3 Feb-19 EA->PE5 0
1 p4 Feb-19 EA->PE5->SC 1
1 p5 Feb-19 PC->PE5->SC 1
  • Related