In the df_input['Visit' ] column, there are three different timepoints that I would like to extract and have print into a new dataframe (df_output). The time points are Pre, Post, and Screening.
I essentially would like to make a for loop (or just a single code strand) stating:
if data_input['Visit '] contains the word "Pre", print "Pre" in df_output['VISIT'] elif data_input['Visit '] contains the word "Post", print "Post" in df_output['VISIT'] else data_input['Visit '] contains the word "Screening", print "Screening" in df_output['VISIT']
I am just not sure of the proper way to do that.
So far, the only thing I have is this line of code:
df_output['VISIT'] = df_input[df_input['Visit '].str.contains('Pr|Po|Sc'))
that gives the error message "Columns must be the same length as key"
I've also tried: df_output['VISIT'] = df_input['Visit '].str.contains('Pr|Po|Sc'), which prints True or False into my output dataframe.
CodePudding user response:
you can use np.select
import numpy
df_output['VISIT'] = np.select( [df_input['Visit'].str.contain('Pr'), # condition #1
df_input['Visit'].str.contain('Po'), # condition #2
df_input['Visit'].str.contain('Sc')], # condition #3
['Pre','Post','Screening'], # corresponding value when true
'')# default value
CodePudding user response:
df_output['VISIT'] = df_input['Visit'].apply(lambda x: 'Pre' if x.str.contains('Pre') else 'Post' if x.str.contains('Post') else 'Screening' if x.str.contains('Screening') else "")
You must create df_ouput first, tho.