How to move a character to another column in the same row in a pandas dataframe-CodePudding

I got stuck trying to clean a dataframe similar to this one:

code	course name	EOS	Mid	test
AA101	Course 1	350	420	NaN
AA102	Course 2	400	470	NaN
AB101	Course 3	#560	570	NaN
AB102	Course 4	410	465	NaN
AC101	Course 5	#	522	NaN

I need to keep only numerical values in the column EOS and move # characters that appear in it to the column test, to indicate that an additional test is required for that course. This is because some of the values have a # before the actual number, such as Course 3, and some have only the # as the value, such as Course 5.

The dataframe was created using Camelot to extract those values from a PDF table.

What I need is to take this # out of this column and add it to the test column instead.

Is there an easy way to do that?

CodePudding user response：

There is no builtin function to do just this, but it can be done using two lines:

df.loc[df['EOS'].str.contains('#', na=False) 'test'] = '#'
df['EOS'] = df['EOS'].str.replace('#', '')

Output:

>>> df
    code  course name  EOS  Mid test
0  AA101     Course 1  350  420  NaN
1  AA102     Course 2  400  470  NaN
2  AB101     Course 3  560  570    #
3  AB102     Course 4  410  465  NaN
4  AC101     Course 5       522

CodePudding user response：

You can do this

data = pd.DataFrame(
    [  ["AA101", "Course 1", "350",  420,   None],
       ["AA102", "Course 2", "400",  470,   None],
       ["AB101", "Course 3", "#560", 570,   None],
       ["AB102", "Course 4", "410",  465,   None],
       ["AC101", "Course 5", "#",    522,   None]  ],
    columns = ["code", "course name", "EOS", "Mid", "test"]
)

transformed_data = (
    data
    .assign(test=lambda d: d["EOS"].apply(lambda d_: "#" if "#" in d_ else None))
    .assign(EOS=lambda d: d["EOS"].replace({"#": ""}, regex=True))
)