what I want to do is delete certain parts of a string and take the rest and insert it into a new column.
Example:
df = pd.read_excel("sdAll.xlsx")
print(df)
output =
0 asin="ASF23KJSA"
1 asin="SAFSAF3324S"
2 asin="ASFAS213434"
3 asin="1SF23AF2342S"
4 asin="ASF23KJSA"
...
424 asin="ASF23KJSA"
425 asin="1SF23AF2342S"
426 asin="ASF23KJSA"
427 asin="BSAFSAF3324S"
428 asin="B095437HDM"
I want to delete the asin=""
part and insert the remaining part into another column.
df.head()
Timeframe Ad Type Start Date End Date Portfolio name Currency ... Spend 14 Day Total Sales Total Advertising Cost of Sales (ACOS) Total Return on Advertising Spend (ROAS) 14 Day Total Orders (#) 14 Day Total Units (#)
0 L30D SD 2022-11-08 2022-11-08 - USD ... 0.00000 0 NaN NaN 0 0
1 L30D SD 2022-11-11 2022-12-03 - USD ... 0.00530 0 NaN 0.0 0 0
2 L30D SD 2022-11-09 2022-11-22 - USD ... 0.00000 0 NaN NaN 0 0
3 L30D SD 2022-11-25 2022-12-04 - USD ... 0.09434 0 NaN 0.0 0 0
4 L30D SD 2022-11-09 2022-11-23 - USD ... 0.00000 0 NaN NaN 0 0
CodePudding user response:
You can use str.replace
and regex
with capturing group.
import pandas as pd
df = pd.DataFrame({'old_column' : ['asin="ASF23KJSA"' , 'asin="SAFSAF3324S"', 'asin="ASFAS213434"' , 'asin="1SF23AF2342S"' , 'asin="ASF23KJSA"']})
df['new_column'] = df['old_column'].str.replace(r'asin=\"(.*)\"', r'\1', regex=True)
print(df)
Output:
old_column new_column
0 asin="ASF23KJSA" ASF23KJSA
1 asin="SAFSAF3324S" SAFSAF3324S
2 asin="ASFAS213434" ASFAS213434
3 asin="1SF23AF2342S" 1SF23AF2342S
4 asin="ASF23KJSA" ASF23KJSA
Explanation:
Capturing group
(
.*
: means "0 or more of any character")
Close capturing group
CodePudding user response:
Why dont you try this
df.insert_your_col_name.str.split('=').str[-1].str.replace('"', '').str.strip()
This will return your wanted string series, usually I also like to do a strip after for good measure.
You can also try str extract, with the following capture group
df.your_col.str.extract(r'\"(.*)\"')
CodePudding user response:
You replace the asin=
part with an empty string, strip leading/ending whitespaces and write it in a new column.
df["new_column_name"] = df["asin_column_name"].str.replace("asin=", "").str.strip()
CodePudding user response:
You can use pandas.Series.str.extract
:
df["new_col"] = df["original_col"].str.extract('"([A-Z0-9] )"', expand=False) #or pat = '"(. )"'
# Output :
print(df)
original_col new_col
0 asin="ASF23KJSA" ASF23KJSA
1 asin="SAFSAF3324S" SAFSAF3324S
2 asin="ASFAS213434" ASFAS213434
3 asin="1SF23AF2342S" 1SF23AF2342S
4 asin="ASF23KJSA" ASF23KJSA
424 asin="ASF23KJSA" ASF23KJSA
425 asin="1SF23AF2342S" 1SF23AF2342S
426 asin="ASF23KJSA" ASF23KJSA
427 asin="BSAFSAF3324S" BSAFSAF3324S
428 asin="B095437HDM" B095437HDM