I have a column that contains data like
Dummy data:
df = pd.DataFrame(["Lyreco A-Type small 2i",
"Lyreco C-Type small 4i",
"Lyreco N-Part medium",
"Lyreco AKG MT small 4i",
"Lyreco AKG/ N-Type medium 4i",
"Lyreco C-Type medium 2i",
"Lyreco C-Type/ SNU medium 2i",
"Lyreco K-part small 4i",
"Lyreco K-Part medium",
"Lyreco SNU small 2i",
"Lyreco C-Part large 2i",
"Lyreco N-Type large 4i"])
I want to create an extra column that strips the data and gives you the required part of the string(see below) in each row. The extracted column should look like this
Column_1 Column_2
Lyreco A-Type small 2i A-Type
Lyreco C-Type small 4i C-Type
Lyreco N-Part medium N-Part
Lyreco STU MT small 4i STU MT
Lyreco AKG/ N-Type medium 4i AKG/ N-Type
Lyreco C-Type medium 2i C-Type
Lyreco C-Type/ SNU medium 2i C-Type/ SNU
Lyreco K-part small 4i K-part
Lyreco K-Part medium K-Part
Lyreco SNU small 2i SNU
Lyreco C-Part large 2i C-Part
Lyreco N-Type large 4i N-Type
How Can I extract Column 2 from the first column? Any leads would be helpful.
CodePudding user response:
You might find that the following logic works with your data:
df["Column_2"] = df["Column_1"].str.extract(r'\w (\S (?: \S )*) \b(?:small|medium|large)\b')
The above pattern matches from the second term until reaching small
, medium
, or large
keywords. Here is a working regex demo.
CodePudding user response:
Looking at the example you posted, it's enough to split the column values and return one of the items. You can make a simple function and apply it to the dataframe like this:
df = pd.DataFrame(
{'Columns_1':
["Lyreco A-Type small 2i",
"Lyreco C-Type small 4i",
"Lyreco N-Part medium",
"Lyreco AKG MT small 4i",
"Lyreco N-Type medium 4i",
"Lyreco C-Type medium 2i",
"Lyreco SNU medium 2i",
"Lyreco K-part small 4i",
"Lyreco K-Part medium",
"Lyreco SNU small 2i",
"Lyreco C-Part large 2i",
"Lyreco N-Type large 4i"]
}
)
def f(row):
return row['Columns_1'].split()[1]
df['Columns_2'] = df.apply(f, axis=1)
print(df)
Columns_1 Columns_2
0 Lyreco A-Type small 2i A-Type
1 Lyreco C-Type small 4i C-Type
2 Lyreco N-Part medium N-Part
3 Lyreco AKG MT small 4i AKG
4 Lyreco N-Type medium 4i N-Type
5 Lyreco C-Type medium 2i C-Type
6 Lyreco SNU medium 2i SNU
7 Lyreco K-part small 4i K-part
8 Lyreco K-Part medium K-Part
9 Lyreco SNU small 2i SNU
10 Lyreco C-Part large 2i C-Part
11 Lyreco N-Type large 4i N-Type
CodePudding user response:
df.columns = ['column_1']
df["column_2"] = [col.split(" ")[1] for col in df.column_1]