how to get specific string of pandas column value?-CodePudding

what I want to do is delete certain parts of a string and take the rest and insert it into a new column.

Example:

df = pd.read_excel("sdAll.xlsx")
print(df)

output =

0      asin="ASF23KJSA"
1      asin="SAFSAF3324S"
2      asin="ASFAS213434"
3      asin="1SF23AF2342S"
4      asin="ASF23KJSA"
             ...
424    asin="ASF23KJSA"
425    asin="1SF23AF2342S"
426    asin="ASF23KJSA"
427    asin="BSAFSAF3324S"
428    asin="B095437HDM"

I want to delete the asin="" part and insert the remaining part into another column.

df.head()

 Timeframe Ad Type Start Date   End Date                           Portfolio name Currency  ...    Spend 14 Day Total Sales Total Advertising Cost of Sales (ACOS)  Total Return on Advertising Spend (ROAS)  14 Day Total Orders (#)  14 Day Total Units (#)
0      L30D      SD 2022-11-08 2022-11-08                                        -      USD  ...  0.00000                  0                                    NaN                                       NaN                        0                       0
1      L30D      SD 2022-11-11 2022-12-03                                        -      USD  ...  0.00530                  0                                    NaN                                       0.0                        0                       0
2      L30D      SD 2022-11-09 2022-11-22                                        -      USD  ...  0.00000                  0                                    NaN                                       NaN                        0                       0
3      L30D      SD 2022-11-25 2022-12-04                                        -      USD  ...  0.09434                  0                                    NaN                                       0.0                        0                       0
4      L30D      SD 2022-11-09 2022-11-23                                        -      USD  ...  0.00000                  0                                    NaN                                       NaN                        0                       0

CodePudding user response：

You can use str.replace and regex with capturing group.

import pandas as pd
df = pd.DataFrame({'old_column' : ['asin="ASF23KJSA"' , 'asin="SAFSAF3324S"', 'asin="ASFAS213434"' , 'asin="1SF23AF2342S"' , 'asin="ASF23KJSA"']})
df['new_column'] = df['old_column'].str.replace(r'asin=\"(.*)\"', r'\1', regex=True)
print(df)

Output:

            old_column    new_column
0     asin="ASF23KJSA"     ASF23KJSA
1   asin="SAFSAF3324S"   SAFSAF3324S
2   asin="ASFAS213434"   ASFAS213434
3  asin="1SF23AF2342S"  1SF23AF2342S
4     asin="ASF23KJSA"     ASF23KJSA

Explanation:

Capturing group (

.* : means "0 or more of any character"

) Close capturing group

CodePudding user response：

Why dont you try this

df.insert_your_col_name.str.split('=').str[-1].str.replace('"', '').str.strip()

This will return your wanted string series, usually I also like to do a strip after for good measure.

You can also try str extract, with the following capture group

df.your_col.str.extract(r'\"(.*)\"')

CodePudding user response：

You replace the asin= part with an empty string, strip leading/ending whitespaces and write it in a new column.

df["new_column_name"] = df["asin_column_name"].str.replace("asin=", "").str.strip()

CodePudding user response：

You can use pandas.Series.str.extract :

df["new_col"] = df["original_col"].str.extract('"([A-Z0-9] )"', expand=False) #or pat = '"(. )"'

# Output :

print(df)
            original_col       new_col
0       asin="ASF23KJSA"     ASF23KJSA
1     asin="SAFSAF3324S"   SAFSAF3324S
2     asin="ASFAS213434"   ASFAS213434
3    asin="1SF23AF2342S"  1SF23AF2342S
4       asin="ASF23KJSA"     ASF23KJSA
424     asin="ASF23KJSA"     ASF23KJSA
425  asin="1SF23AF2342S"  1SF23AF2342S
426     asin="ASF23KJSA"     ASF23KJSA
427  asin="BSAFSAF3324S"  BSAFSAF3324S
428    asin="B095437HDM"    B095437HDM