Replace a string if it starts with a certain character-CodePudding

I am trying to use Pandas map to assign values to keys, where the keys would be strings returned if an entry in the DataFrame starts with a certain character.

Using an example from the Pandas docs, with the following DataFrame and my code:

import numpy as np
import pandas as pd

s = pd.Series(['cat', 'dog', np.nan, 'rabbit'])

s.map({ lambda x: x if x.startswith('c') else None: 'kitten', 
        lambda x: x if x.startswith('d') else None: 'puppy',
        lambda x: x if x.startswith('r') else None: 'bunny',
    })

Expected result:

0    kitten
1     puppy
2       NaN
3     bunny
dtype: object

Currently, my code returns 4 NaN values. I am specifying startswith because I am not always able to know the last characters of the string in my DataFrame, but I know the first character/s. Any help would be appreciated.

CodePudding user response：

Instead function lambda is possible create dictionary and mapping first letter by indexing str[0]:

print (s.str[0].map({'c': 'kitten', 'd': 'puppy', 'r': 'bunny'}))
0    kitten
1     puppy
2       NaN
3     bunny
dtype: object

If lengths of strings for test subtrings are different, not always same length:

d = {'ca': 'kitten', 'd': 'puppy', 'rab': 'bunny'}

for k, v in d.items():
    s.loc[s.str.startswith(k, na=False)] = v
print (s)
0    kitten
1     puppy
2       NaN
3     bunny
dtype: object

CodePudding user response：

You can store in a dictionary the mapping of first letter -> value, then use only one lambda function to lookup the corresponding value:

import numpy as np
import pandas as pd

mapping = {'c': 'kitten', 'd': 'puppy', 'r': 'bunny'}

s = pd.Series(['cat', 'dog', np.nan, 'rabbit'])

s.map(lambda x: mapping.get(x[0]) if x else None)

CodePudding user response：

You can use the method replace and regex:

dct = {'^c.*': 'kitten', '^d.*': 'puppy', '^r.*': 'bunny'}
s.replace(dct, regex=True)

Output:

0    kitten
1     puppy
2       NaN
3     bunny
dtype: object