Home > database >  ValueError: Columns must be same length as key with multiple outputs
ValueError: Columns must be same length as key with multiple outputs

Time:12-22

I am extracting a substring from an Excel cell and the entire string says this:

The bolts are 5" long each and 3" apart

I want to extract the length of the bolt which is 5". And I use the following code to get that

df['Bolt_Length'] = df['Description'].str.extract(r'(\s[0-9]")',expand=False)

But if the string says the following:

The bolts are 10" long each and 3" apart

and I try to use to the following code:

df['Bolt_Length'] = df['Description'].str.extract(r'(\s(\d{1,2})")',expand=False)

I get the following error message:

ValueError: Columns must be same length as key

I think Python doesn't know which number to acquire. The 10" or 3"

How can I fix this? How do I tell Python to only go for the first "?

On another note what if I want to get both the bolt length and distance from another bolt? How do I extract the two at the same time?

CodePudding user response:

Your problem is that you have two capture groups in your second regular expression (\s(\d{1,2})"), not one. So basically, you're telling Python to get the number with the ", and the same number without the ":

>>> df['Description'].str.extract(r'(\s(\d{1,2})")', expand=False)
      0   1
0    5"   5
1   10"  10

You can add ?: right after the opening parenthesis of a group to make it so that it does not capture anything, though it still functions as a group. The following makes it so that the inner group, which excludes the ", does not capture:

#                                notice vv
>>> df['Description'].str.extract(r'(\s(?:\d{1,2})")', expand=False)
0      5"
1     10"
Name: Description, dtype: object

CodePudding user response:

The error occurs because your regex contains two capturing groups, that extract two column values, BUT you assign those to a single column, df['Bolt_Length'].

You need to use as many capturing groups in the regex pattern as there are columns you assign the values to:

df['Bolt_Length'] = df['Description'].str.extract(r'\s(\d{1,2})"',expand=False)

The \s(\d{1,2})" regex only contains one pair of unescaped parentheses that form a capturing group, so this works fine since this single value is assigned to a single Bolt_Length column.

  • Related