Home > Back-end >  Replacing Substring with another string from column Pandas
Replacing Substring with another string from column Pandas

Time:03-14

Got this DataFrame:

Type String ext_id int_id
1 UKidBC 2393 2820
1 UKidBC 4816 1068
0 UKidBC 4166 3625
0 UKidBC 2803 1006
1 UKidBC 1189 2697

For each value on String column, I need to replace the substring 'id' (UKidBC) according to the following rule:

If df['Type'] = 1 then replace substring 'id' with the corresponding df['int_id'] value else replace substring 'id' with the corresponding df['ext_id'] value.

I tried to use that line:

new_df.apply(lambda x: x['string'].replace(pat=['id'], 
   repl=x['int_id']) if x['Type'] == 1
   else x['string'].replace(pat=['id'],repl=x['ext_id']),axis=1)

Keep getting this error:

str.replace() takes no keyword arguments

What I am doing wrong here?

CodePudding user response:

Instead of apply, we could use str.split np.where to replace values according to "Type" value:

tmp = df['String'].str.split('id', expand=True)
df['String'] = tmp[0]   np.where(df['Type'].astype(bool), df['int_id'].astype(str), df['ext_id'].astype(str))   tmp[1]

Output:

   Type    String  ext_id  int_id
0     1  UK2820BC    2393    2820
1     1  UK1068BC    4816    1068
2     0  UK4166BC    4166    3625
3     0  UK2803BC    2803    1006
4     1  UK2697BC    1189    2697

CodePudding user response:

Assuming your string is fixed, use numpy.where and vector string concatenation:

df['String'] = df['String'].str[:2]   np.where(df['Type'].eq(1), df['int_id'], df['ext_id'])   df['String'].str[4:]

CodePudding user response:

You can use .str.extract and np.where:

df['String'] = df['String'].str.extract(r'(?P<g0>. )id(?P<g2>. )').assign(g1=np.where(df['Type'] == 1, df['int_id'], df['ext_id']).astype(str)).sort_index(axis=1).agg(list, axis=1).str.join('')

Output:

>>> df
   Type    String  ext_id  int_id
0     1  UK2820BC    2393    2820
1     1  UK1068BC    4816    1068
2     0  UK4166BC    4166    3625
3     0  UK2803BC    2803    1006
4     1  UK2697BC    1189    2697

CodePudding user response:

Use the same idea as yours (apply(), replace()), just modify a bit about using replace().

new_df["String"] = new_df.apply(
    lambda row: row["String"].replace("id", row["int_id"]) if row["type"] == 1 else row["String"].replace("id", row["ext_id"]),
    axis=1
)

output:

   Type    String  ext_id  int_id
0     1  UK2820BC    2393    2820
1     1  UK1068BC    4816    1068
2     0  UK4166BC    4166    3625
3     0  UK2803BC    2803    1006
4     1  UK2697BC    1189    2697

CodePudding user response:

This question honestly looks like one of those coding challenges you see.

Assuming that your dataframe variable is new_df:

for i in new_df:
    i["string"].replace("id", i["int_id"] if i["type"] else i["ext_id"])

What you did wrong is (as the error says) you gave keyword arguments to str.replace, which does not take kwargs. Instead, the first argument is the pattern to replace, and the second is what to replace it with.

CodePudding user response:

List comprehension with np.where may serve you fast:

strings = np.where(df['Type'].eq(1),df['int_id'],df['ext_id']).astype(str)
df['String'] = [a.replace("id",b) for a,b in zip(df['String'],strings)]

print(df)

   Type    String  ext_id  int_id
0     1  UK2820BC    2393    2820
1     1  UK1068BC    4816    1068
2     0  UK4166BC    4166    3625
3     0  UK2803BC    2803    1006
4     1  UK2697BC    1189    2697
  • Related