Got this DataFrame:
Type | String | ext_id | int_id |
---|---|---|---|
1 | UKidBC | 2393 | 2820 |
1 | UKidBC | 4816 | 1068 |
0 | UKidBC | 4166 | 3625 |
0 | UKidBC | 2803 | 1006 |
1 | UKidBC | 1189 | 2697 |
For each value on String column, I need to replace the substring 'id' (UKidBC) according to the following rule:
If df['Type'] = 1
then replace substring 'id' with the corresponding df['int_id']
value else replace substring 'id' with the corresponding df['ext_id']
value.
I tried to use that line:
new_df.apply(lambda x: x['string'].replace(pat=['id'],
repl=x['int_id']) if x['Type'] == 1
else x['string'].replace(pat=['id'],repl=x['ext_id']),axis=1)
Keep getting this error:
str.replace() takes no keyword arguments
What I am doing wrong here?
CodePudding user response:
Instead of apply
, we could use str.split
np.where
to replace values according to "Type" value:
tmp = df['String'].str.split('id', expand=True)
df['String'] = tmp[0] np.where(df['Type'].astype(bool), df['int_id'].astype(str), df['ext_id'].astype(str)) tmp[1]
Output:
Type String ext_id int_id
0 1 UK2820BC 2393 2820
1 1 UK1068BC 4816 1068
2 0 UK4166BC 4166 3625
3 0 UK2803BC 2803 1006
4 1 UK2697BC 1189 2697
CodePudding user response:
Assuming your string is fixed, use numpy.where
and vector string concatenation:
df['String'] = df['String'].str[:2] np.where(df['Type'].eq(1), df['int_id'], df['ext_id']) df['String'].str[4:]
CodePudding user response:
You can use .str.extract
and np.where
:
df['String'] = df['String'].str.extract(r'(?P<g0>. )id(?P<g2>. )').assign(g1=np.where(df['Type'] == 1, df['int_id'], df['ext_id']).astype(str)).sort_index(axis=1).agg(list, axis=1).str.join('')
Output:
>>> df
Type String ext_id int_id
0 1 UK2820BC 2393 2820
1 1 UK1068BC 4816 1068
2 0 UK4166BC 4166 3625
3 0 UK2803BC 2803 1006
4 1 UK2697BC 1189 2697
CodePudding user response:
Use the same idea as yours (apply()
, replace()
), just modify a bit about using replace()
.
new_df["String"] = new_df.apply(
lambda row: row["String"].replace("id", row["int_id"]) if row["type"] == 1 else row["String"].replace("id", row["ext_id"]),
axis=1
)
output:
Type String ext_id int_id 0 1 UK2820BC 2393 2820 1 1 UK1068BC 4816 1068 2 0 UK4166BC 4166 3625 3 0 UK2803BC 2803 1006 4 1 UK2697BC 1189 2697
CodePudding user response:
This question honestly looks like one of those coding challenges you see.
Assuming that your dataframe variable is new_df
:
for i in new_df:
i["string"].replace("id", i["int_id"] if i["type"] else i["ext_id"])
What you did wrong is (as the error says) you gave keyword arguments to str.replace
, which does not take kwargs. Instead, the first argument is the pattern to replace, and the second is what to replace it with.
CodePudding user response:
List comprehension with np.where may serve you fast:
strings = np.where(df['Type'].eq(1),df['int_id'],df['ext_id']).astype(str)
df['String'] = [a.replace("id",b) for a,b in zip(df['String'],strings)]
print(df)
Type String ext_id int_id
0 1 UK2820BC 2393 2820
1 1 UK1068BC 4816 1068
2 0 UK4166BC 4166 3625
3 0 UK2803BC 2803 1006
4 1 UK2697BC 1189 2697