How to skip NaN values when splitting up a column-CodePudding

I am trying to split a column up into two columns based on a delimeter. The column presently has text that is separated by a '-'. Some of the values in the column are NaN, so when I run the code below, I get the following error message: ValueError: Columns must be same length as key.

I don't want to delete the NaN values, but am not sure how to skip them so that this splitting works.

The code I have right now is:

df[['A','B']] = df['A'].str.split('-',expand=True)

CodePudding user response：

Maybe filter them out with loc:

df.loc[df['A'].notna(), ['A','B']] = df.loc[df['A'].notna(), 'A'].str.split('-',expand=True)

CodePudding user response：

Your code works well with NaN values but you have to use n=1 as parameter of str.split:

Suppose this dataframe:

df = pd.DataFrame({'A': ['hello-world', np.nan, 'raise-an-exception']}
print(df)

# Output:
                    A
0         hello-world
1                 NaN
2  raise-an-exception

Reproducible error:

df[['A', 'B']] = df['A'].str.split('-', expand=True)
print(df)

# Output:
...
ValueError: Columns must be same length as key

Use n=1:

df[['A', 'B']] = df['A'].str.split('-', n=1, expand=True)
print(df)

# Output:
       A             B
0  hello         world
1    NaN           NaN
2  raise  an-exception

An alternative is to generate more columns:

df1 = df['A'].str.split('-', expand=True)
df1.columns = df1.columns.map(lambda x: chr(x 65))
print(df1)

# Output:
       A      B          C
0  hello  world       None
1    NaN    NaN        NaN
2  raise     an  exception