I have a data frame that looks like the following one:
I need to add columns that will include the following:
- Number of job{i}_start_year columns which are not empty at the end of each row
- Number of edu{i}_start_year columns which are not empty at the end of each row
Any help would be greatly appreciated.
CodePudding user response:
Use DataFrame.assign
with filter columns names by DataFrame.filter
, if necessary replace empty strigns to missing values, so possible forward filling missing values per rows by ffill
and select last column by position in DataFrame.iloc
:
df.assign(job = df.filter(regex='job\d').replace('',np.nan).ffill(axis=1).iloc[:, -1],
edu = df.filter(regex='edu\d').replace('',np.nan).ffill(axis=1).iloc[:, -1])
CodePudding user response:
As I understand it you want extra columns at the end of each row with the number of non-empty years in the job and edu categories. Try:
job_columns= ['job1_start_year','job2_start_year','job3_start_year']
edu_columns = ['edu1_start_year', 'edu2_start_year','edu3_start_year']
df['job_count'] = df[job_columns].ne('').sum(axis = 1).values
df['edu_count'] = df[edu_columns].ne('').sum(axis = 1).values