As part of a larger data clean up exercise I am using pandas.Series.str.extractall(pat)
on a data frame column. Calling extractall()
returns a multi-index due to multiple matches. I would like to flatten and merge the result from extractall()
into a data frame.
I've spent countless hours with various combinations of apply
, agg
, groupby
, join
and haven't made any progress. Any suggestions?
Simplified example
df = pd.DataFrame([["cat 123"], ["hat dog 36776"], ["dog 345"], ["fish 456890 hat"]], columns=['A'])
pat = r'(?:(?P<Mammal>(?:cat)|(?:dog))|(?P<Fish>(?:fish))|(?P<Hat>(?:hat)))'
df_ea = df['A'].str.extractall(pat)
Results in this table
Mammal | Fish | Hat | ||
---|---|---|---|---|
match | ||||
0 | 0 | cat | NaN | NaN |
1 | 0 | NaN | NaN | hat |
1 | dog | NaN | NaN | |
2 | 0 | dog | NaN | NaN |
3 | 0 | NaN | fish | NaN |
1 | NaN | NaN | hat |
Desired result
df_test = pd.DataFrame([["cat", "", ""],
["dog", "", "hat"],
["dog", "", ""],
["", "fish", "hat"]],
columns=['Mammal', 'Fish', 'Hat'])
Mammal | Fish | Hat | |
---|---|---|---|
0 | cat | ||
1 | dog | hat | |
2 | dog | ||
3 | fish | hat |
CodePudding user response:
You could use groupby
as follows:
df_ea = df['A'].str.extractall(pat).groupby(level=0).first()
This will be the result:
Mammal Fish Hat
0 cat None None
1 dog None hat
2 dog None None
3 None fish hat
CodePudding user response:
df_ea=df_ea.reset_index(drop=True).fillna('')
you've already done a great job you're only missing this last line