I'd like to set values on a slice of a DataFrame using .loc
using pandas str extract method .str.extract()
however, it's not working due to indexing errors. This code works perfectly if I swap extract
with contains
.
Here is a sample frame:
import pandas as pd
df = pd.DataFrame(
{
'name': [
'JUNK-0003426', 'TEST-0003435', 'JUNK-0003432', 'TEST-0003433', 'TEST-0003436',
],
'value': [
'Junk', 'None', 'Junk', 'None', 'None',
]
}
)
Here is my code:
df.loc[df["name"].str.startswith("TEST"), "value"] = df["name"].str.extract(r"TEST-\d{3}(\d )")
How can I set the None
values to the extracted regex string
CodePudding user response:
Hmm the problem seems to be that .str.extract
returns a pd.DataFrame
, you can .squeeze
it to turn it into a series and it seems to work fine:
df.loc[df["name"].str.startswith("TEST"), "value"] = df["name"].str.extract(r"TEST-\d{3}(\d )").squeeze()
indexing alignment takes care of the rest.
CodePudding user response:
Instead of trying to get the group, you can replace the rest with the empty string:
df.loc[df['value']=='None', 'value'] = df.loc[df['value']=='None', 'name'].str.replace('TEST-\d{3}', '')
Was this answer helpful to your problem?
CodePudding user response:
Here is a way to do it:
df.loc[df["name"].str.startswith("TEST"), "value"] = df["name"].str.extract(r"TEST-\d{3}(\d )").loc[:,0]
Output:
name value
0 JUNK-0003426 Junk
1 TEST-0003435 3435
2 JUNK-0003432 Junk
3 TEST-0003433 3433
4 TEST-0003436 3436