I feel like I have to apologize in advance for this one, but I've searched for answers and they seem to tell me what I'm doing is correct.
I'm trying to set a DataFrame column to True if another column has instances of a lowercase letter immediately followed by an uppercase letter.
What I tried was this:
cities['multiteam'] = cities['team'].apply(lambda x: pd.notna(re.search(r'[A][a]',x)))
That's setting all the results to False, so I figured maybe I was doing something wrong with my lambda function and I made the following to debug just the re.search() part:
cities['multiteam'] = pd.notna(re.search(r'[a][A]','OneTwo'))
That's also setting all the results to False. And there I'm stuck.
CodePudding user response:
The following code is useful only to look for a letter 'A' followed by the lower case 'a'.
cities['multiteam'] = cities['team'].apply(lambda t: pd.notna(re.search(r'[A][a]',t)))
You may need to change it if you want to check it for all letters. Maybe replace that line with something like this:
cities['multiteam'] = cities['team'].apply(lambda t: pd.notna(re.search(r'[A-Z][a-z]',t)))
CodePudding user response:
You should never to apologise about asking questions. Using apply
is quite slow, try and use the str.contains
which can accept a regex pattern.
cities.assign(multiteam=cities.team.str.contains('[a-z][A-Z]'))
The assign
above is pandas new recommend way of assigning columns.
The str.contains
works with regex and fixed strings, much faster than apply
.
The regex pattern above says a range of a-z
followed by A-Z
.