I have a third column in my data frame where I want to be able to create a fourth column that looks almost the same, except it has no double quotes and there is a 'user/' prefix before each ID in the list. Also, sometimes it is just a single ID vs. list of IDs (as shown in example DF).
original
col1 col2 col3
01 01 "ID278, ID289"
02 02 "ID275"
desired
col1 col2 col3 col4
01 01 "ID278, ID289" user/ID278, user/ID289
02 02 "ID275" user/ID275
CodePudding user response:
df.col4 = df.col3.str.strip('"')
df.col4 = 'user/' df.col4
should do the trick.
In general, operations for vectorized string manipulations are performed by pd.Series.str...
operations. Most of their names closely match either a Python string method or re
method. Pandas usually supports standard Python operators ( , -, *, etc.) with strings and will interpolate scalars as vectors with the dimensions of the column your are working with.
A slow option is always just to use Series.apply(func)
where this just iterates over values in the series and passes the value to a function, func
.
CodePudding user response:
You can use .apply() function:
def function(x):
elements = x.split(", ")
out = list()
for i in elements:
out.append(f"user/{i}")
return ", ".join(out)
df["col4"] = df.col3.apply(function)
That returns:
col1 col2 col3 col4
1 1 ID278, ID289 user/ID278, user/ID289
2 2 ID275 user/ID275
CodePudding user response:
Here's a solution that takes both the double quotes and ID lists into account:
# remove the double quotes
df['col4'] = df['col3'].str.strip('"')
# split the string, add prefix user/, and then join
df['col4'] = df['col4'].apply(lambda x: ', '.join(f"user/{userId}" for userId in x.split(', ')))