I am trying to use Pandas map
to assign values to keys, where the keys would be strings returned if an entry in the DataFrame starts with a certain character.
Using an example from the Pandas docs, with the following DataFrame and my code:
import numpy as np
import pandas as pd
s = pd.Series(['cat', 'dog', np.nan, 'rabbit'])
s.map({ lambda x: x if x.startswith('c') else None: 'kitten',
lambda x: x if x.startswith('d') else None: 'puppy',
lambda x: x if x.startswith('r') else None: 'bunny',
})
Expected result:
0 kitten
1 puppy
2 NaN
3 bunny
dtype: object
Currently, my code returns 4 NaN
values. I am specifying startswith
because I am not always able to know the last characters of the string in my DataFrame, but I know the first character/s. Any help would be appreciated.
CodePudding user response:
Instead function lambda
is possible create dictionary and mapping first letter by indexing str[0]
:
print (s.str[0].map({'c': 'kitten', 'd': 'puppy', 'r': 'bunny'}))
0 kitten
1 puppy
2 NaN
3 bunny
dtype: object
If lengths of strings for test subtrings are different, not always same length:
d = {'ca': 'kitten', 'd': 'puppy', 'rab': 'bunny'}
for k, v in d.items():
s.loc[s.str.startswith(k, na=False)] = v
print (s)
0 kitten
1 puppy
2 NaN
3 bunny
dtype: object
CodePudding user response:
You can store in a dictionary the mapping of first letter -> value, then use only one lambda function to lookup the corresponding value:
import numpy as np
import pandas as pd
mapping = {'c': 'kitten', 'd': 'puppy', 'r': 'bunny'}
s = pd.Series(['cat', 'dog', np.nan, 'rabbit'])
s.map(lambda x: mapping.get(x[0]) if x else None)
CodePudding user response:
You can use the method replace
and regex:
dct = {'^c.*': 'kitten', '^d.*': 'puppy', '^r.*': 'bunny'}
s.replace(dct, regex=True)
Output:
0 kitten
1 puppy
2 NaN
3 bunny
dtype: object