I have a df like:
df = pd.DataFrame({'Temp' : ['ko1234', 'ko1234|ko445|ko568', 'map123', 'ko895', 'map123|ko889|ko665', 'ko635|map789|map777', 'ko985']})
(out) >>>
ko1234
ko1234|ko445|ko568
map123
ko895
map123|ko889|ko665
ko635|map789|map777
ko985
I need two things:
- I want to keep only the words starting with
ko
but keep the remaining spaces, so:
ko1234
ko1234|ko445|ko568
ko895
ko889|ko665
ko635
ko985
- In another case he would like to do this:
- if there is only one word keep it
- if there are more words divided by a "|" keep only the second one, so:
ko1234
ko445
map123
ko895
ko889
map789
ko985
What is the best way to do this?
CodePudding user response:
Here is how to do it using .apply
(or .transform
- the result will be the same).
The functions are applied to each element of the Series lists
- which cointains a list of words (that were separated by "|" in the column Temp
):
lists = df['Temp'].str.split('|')
def starting_with_ko(lst):
ko = [word for word in lst if word.startswith('ko')]
return '|'.join(ko) if ko else ''
def choose_element(lst):
if len(lst) == 1:
return lst[0]
else:
return lst[1]
out1 = lists.apply(starting_with_ko)
out2 = lists.apply(choose_element)
Results:
>>> out1
0 ko1234
1 ko1234|ko445|ko568
2
3 ko895
4 ko889|ko665
5 ko635
6 ko985
dtype: object
>>> out2
0 ko1234
1 ko445
2 map123
3 ko895
4 ko889
5 map789
6 ko985
dtype: object
CodePudding user response:
We can do split
then explode
and remove the unwanted items with startswith
out = s.str.split('|').explode().str.strip()
out1 = out[out.str.startswith('ko')].groupby(level=0).agg('|'.join).reindex(s.index)
out2 = s.str.split('|').str[1].fillna(s)