How can I replace ' ' values with null values in Python?-CodePudding

EDIT/UPDATE: I finally found a solution - the following worked:

df=df.replace(r'^\s*$', np.nan, regex=True)

I am trying to replace ' ' values with null values in python. Essentially, I am converting an text file to Python using substrings. In the file, all rows have the same number of characters but only one column - I need to convert this to distinct columns each with row values - example below:

df['column1'] = df['data'].str[0:2]
df['column2'] = df['data'].str[5:14]

In cases where a column's row value should be null, it is instead a space or a series of spaces (' '). I have tried the following:

df=df.replace(' ', "Null")
df=df.replace(' ', None)
df=df.replace('', None)
df=df.replace(r'\s*', None, regex=True)

This has not worked for me (it doesn't even change cases when the entire cell value is ' '); the space values remain, both for cells with spaces in between numbers (like ' 1' rather than '1') and for cells which should be empty. How can I solve this?

Example of data is below. Where there appears to be a blank value, it is actually one or two spaces (depending on the number of spaces of the cell):\

CodePudding user response：

You should maybe use an empty string, not space:

df.replace('', None)

If you're unsure, you can go for:

df.replace(r'\s*', None, regex=True)

Or, to restrict to full matches:

df.replace(r'^\s*$', None, regex=True)

CodePudding user response：

Try using strip to remove both leading and trailing spaces.

df['column1'] = df['data'].str[0:2]
df['column1'] = df['column1'].str.strip()
df['column2'] = df['data'].str[5:14]
df['column2'] = df['column2'].str.strip()