Keep only some elements in a column of string-CodePudding

I have a df like:

df = pd.DataFrame({'Temp' : ['ko1234', 'ko1234|ko445|ko568', 'map123', 'ko895', 'map123|ko889|ko665', 'ko635|map789|map777', 'ko985']})
(out) >>>
ko1234
ko1234|ko445|ko568
map123
ko895
map123|ko889|ko665
ko635|map789|map777
ko985

I need two things:

I want to keep only the words starting with ko but keep the remaining spaces, so:

ko1234
ko1234|ko445|ko568
    
ko895
ko889|ko665
ko635
ko985

In another case he would like to do this:
- if there is only one word keep it
- if there are more words divided by a "|" keep only the second one, so:

ko1234
ko445
map123
ko895
ko889
map789
ko985

What is the best way to do this?

CodePudding user response：

Here is how to do it using .apply (or .transform - the result will be the same).

The functions are applied to each element of the Series lists - which cointains a list of words (that were separated by "|" in the column Temp):

lists = df['Temp'].str.split('|')

def starting_with_ko(lst):
    ko = [word for word in lst if word.startswith('ko')]
    return '|'.join(ko) if ko else ''

def choose_element(lst):
    if len(lst) == 1:
        return lst[0]
    else:
        return lst[1]

out1 = lists.apply(starting_with_ko)
out2 = lists.apply(choose_element)

Results:

>>> out1
0                ko1234
1    ko1234|ko445|ko568
2                      
3                 ko895
4           ko889|ko665
5                 ko635
6                 ko985
dtype: object

>>> out2
0    ko1234
1     ko445
2    map123
3     ko895
4     ko889
5    map789
6     ko985
dtype: object

CodePudding user response：

We can do split then explode and remove the unwanted items with startswith

out = s.str.split('|').explode().str.strip()
out1 = out[out.str.startswith('ko')].groupby(level=0).agg('|'.join).reindex(s.index)

out2 = s.str.split('|').str[1].fillna(s)