I am trying to merge two columns in a pandas dataframe with 1000 rows. The difficulty I am having is merging only a specific part of the text. For example:
This is what I have, two columns:
Col1 Col2
11.50.199.1 12121.0
12.55.222.1 12121.0
What I want is to merge the two columns so that the the example above becomes this:
ColNew
11.50.199.12121
12.55.222.12121
I want to get rid of the last 1 in Col1 and of the .0 in Col2.
I could not manage to do it. I appreciat any help.:)
CodePudding user response:
Use splitting values by .
and selecting first lists by str[0]
indexing:
df['ColNew'] = (df['Col1'].str.rsplit('.', n=1).str[0] '.'
df['Col2'].astype(str).str.split('.').str[0])
print (df)
Col1 Col2 ColNew
0 11.50.199.1 12121.0 11.50.199.12121
1 12.55.222.1 12121.0 12.55.222.12121
Or use Series.str.replace
for last digits with .
in both columns:
df['ColNew'] = (df['Col1'].str.replace(r'\.\d $', '') '.'
df['Col2'].astype(str).str.replace(r'\.\d $', '') )
CodePudding user response:
Using an apply :
df["ColNew"] = df.apply(lambda s:
s.Col1.str.removesuffix("1)"
s.Col2.str.removesuffix(".0"),
axis=1)
We use apply to iterate on the DataFrame. Here we want to iterate on rows because we have to retrieve a value from 2 different columns, so we specify axis=1
.
Since apply
passes a Series
to the function, we access the corresponding column's values using s.col_name
(or s["col_name"]).
We then use .str to vectorize our string operation : removesuffix, which will return the string passed, minus the provided suffix.
Note that .str
will fail if the Series' type is not already string. In that case we need to convert it to string explicitly before using .str
using .astype(str)
:
df["ColNew"] = df.apply(lambda s:
s.Col1.astype(str).str.removesuffix("1)"
s.Col2.astype(str).str.removesuffix(".0"),
axis=1)