Home > database >  How to merge only specific text parts of two columns into one column?
How to merge only specific text parts of two columns into one column?

Time:04-13

I am trying to merge two columns in a pandas dataframe with 1000 rows. The difficulty I am having is merging only a specific part of the text. For example:

This is what I have, two columns:

Col1            Col2
11.50.199.1     12121.0
12.55.222.1     12121.0

What I want is to merge the two columns so that the the example above becomes this:

ColNew
11.50.199.12121
12.55.222.12121

I want to get rid of the last 1 in Col1 and of the .0 in Col2.

I could not manage to do it. I appreciat any help.:)

CodePudding user response:

Use splitting values by . and selecting first lists by str[0] indexing:

df['ColNew'] = (df['Col1'].str.rsplit('.', n=1).str[0]    '.'   
                df['Col2'].astype(str).str.split('.').str[0])
print (df)
          Col1     Col2           ColNew
0  11.50.199.1  12121.0  11.50.199.12121
1  12.55.222.1  12121.0  12.55.222.12121

Or use Series.str.replace for last digits with . in both columns:

df['ColNew'] = (df['Col1'].str.replace(r'\.\d $', '')     '.'   
                df['Col2'].astype(str).str.replace(r'\.\d $', '') )

CodePudding user response:

Using an apply :

df["ColNew"] = df.apply(lambda s:
                        s.Col1.str.removesuffix("1)"
                          s.Col2.str.removesuffix(".0"),
                        axis=1)

We use apply to iterate on the DataFrame. Here we want to iterate on rows because we have to retrieve a value from 2 different columns, so we specify axis=1.

Since apply passes a Series to the function, we access the corresponding column's values using s.col_name (or s["col_name"]).

We then use .str to vectorize our string operation : removesuffix, which will return the string passed, minus the provided suffix.

Note that .str will fail if the Series' type is not already string. In that case we need to convert it to string explicitly before using .str using .astype(str) :

df["ColNew"] = df.apply(lambda s:
                        s.Col1.astype(str).str.removesuffix("1)"
                          s.Col2.astype(str).str.removesuffix(".0"),
                        axis=1)
  • Related