Replacing Substring with another string from column Pandas-CodePudding

Got this DataFrame:

Type	String	ext_id	int_id
1	UKidBC	2393	2820
1	UKidBC	4816	1068
0	UKidBC	4166	3625
0	UKidBC	2803	1006
1	UKidBC	1189	2697

For each value on String column, I need to replace the substring 'id' (UKidBC) according to the following rule:

If df['Type'] = 1 then replace substring 'id' with the corresponding df['int_id'] value else replace substring 'id' with the corresponding df['ext_id'] value.

I tried to use that line:

new_df.apply(lambda x: x['string'].replace(pat=['id'], 
   repl=x['int_id']) if x['Type'] == 1
   else x['string'].replace(pat=['id'],repl=x['ext_id']),axis=1)

Keep getting this error:

str.replace() takes no keyword arguments

What I am doing wrong here?

CodePudding user response：

Instead of apply, we could use str.split np.where to replace values according to "Type" value:

tmp = df['String'].str.split('id', expand=True)
df['String'] = tmp[0]   np.where(df['Type'].astype(bool), df['int_id'].astype(str), df['ext_id'].astype(str))   tmp[1]

Output:

   Type    String  ext_id  int_id
0     1  UK2820BC    2393    2820
1     1  UK1068BC    4816    1068
2     0  UK4166BC    4166    3625
3     0  UK2803BC    2803    1006
4     1  UK2697BC    1189    2697

CodePudding user response：

Assuming your string is fixed, use numpy.where and vector string concatenation:

df['String'] = df['String'].str[:2]   np.where(df['Type'].eq(1), df['int_id'], df['ext_id'])   df['String'].str[4:]

CodePudding user response：

You can use .str.extract and np.where:

df['String'] = df['String'].str.extract(r'(?P<g0>. )id(?P<g2>. )').assign(g1=np.where(df['Type'] == 1, df['int_id'], df['ext_id']).astype(str)).sort_index(axis=1).agg(list, axis=1).str.join('')

Output:

>>> df
   Type    String  ext_id  int_id
0     1  UK2820BC    2393    2820
1     1  UK1068BC    4816    1068
2     0  UK4166BC    4166    3625
3     0  UK2803BC    2803    1006
4     1  UK2697BC    1189    2697

CodePudding user response：

Use the same idea as yours (apply(), replace()), just modify a bit about using replace().

new_df["String"] = new_df.apply(
    lambda row: row["String"].replace("id", row["int_id"]) if row["type"] == 1 else row["String"].replace("id", row["ext_id"]),
    axis=1
)

output:

   Type    String  ext_id  int_id
0     1  UK2820BC    2393    2820
1     1  UK1068BC    4816    1068
2     0  UK4166BC    4166    3625
3     0  UK2803BC    2803    1006
4     1  UK2697BC    1189    2697

CodePudding user response：

This question honestly looks like one of those coding challenges you see.

Assuming that your dataframe variable is new_df:

for i in new_df:
    i["string"].replace("id", i["int_id"] if i["type"] else i["ext_id"])

What you did wrong is (as the error says) you gave keyword arguments to str.replace, which does not take kwargs. Instead, the first argument is the pattern to replace, and the second is what to replace it with.

CodePudding user response：

List comprehension with np.where may serve you fast:

strings = np.where(df['Type'].eq(1),df['int_id'],df['ext_id']).astype(str)
df['String'] = [a.replace("id",b) for a,b in zip(df['String'],strings)]

print(df)

   Type    String  ext_id  int_id
0     1  UK2820BC    2393    2820
1     1  UK1068BC    4816    1068
2     0  UK4166BC    4166    3625
3     0  UK2803BC    2803    1006
4     1  UK2697BC    1189    2697