Home > Net >  Adding column by substring from another column in Pandas
Adding column by substring from another column in Pandas

Time:09-05

I have a data frame with one column,

DF = pd.DataFrame({'files': ["S18-000344PAS", "S18-001850HE1", "S18-00344HE1"]})

I want to add another column with the substring of files, the final dataframe should look like

DF = pd.DataFrame({'files': ["S18-000344PAS", "S18-001850HE1", "S18-00344HE1"], 'stain': ["PAS", "HE1", "HE1"]})

I try

DF["Stain"] = DF.apply(lambda row: row.files[re.search(r'[a-zA-Z]{2,}', row.files).start():], axis=1)

But it returned

AttributeError: 'NoneType' object has no attribute 'start'

What should I do?

CodePudding user response:

If you want to extract last 3 characters from the files column you can do:

DF["stain"] = DF["files"].str[-3:]
print(DF)

Prints:

           files stain
0  S18-000344PAS   PAS
1  S18-001850HE1   HE1
2   S18-00344HE1   HE1

EDIT: Using regular expression to extract the stain:

DF["stain"] = DF["files"].str.extract(r"^(?:.{2,})-\d*(. )")
print(DF)

CodePudding user response:

Here's one approach using the str accessor

DF[["files", "stain"]] = DF["files"].str.extract(pat="(. \d)(\D. )")
    files   stain
0   S18-000344  PAS
1   S18-001850  HE1
2   S18-00344   HE1

If you need to keep the extracted variable in the first column, you can do

DF["stain"] = DF["files"].str.extract(pat="(. \d)(\D. )")[1]
    files   stain
0   S18-000344PAS   PAS
1   S18-001850HE1   HE1
2   S18-00344HE1    HE1

  • Related