How to remove rows that contain empty elements in python-CodePudding

I have a dataset that includes full x & y columns but some z columns don't have any data (shown below)

     x            y            z
 246.590724   -24.380943     --
 246.593273   -24.381374     --
246.7051392 -24.33428028     --
 246.823241   -24.087254 7.2435
245.5418243 -21.65491486 7.1933
  246.39548 -23.40651445 7.2318
 246.605133   -24.390006     --
  246.67683   -24.440527     --
247.3474986  -24.2324699     --
  246.80526   -24.692698     --
244.7469482 -23.66201019 7.2379

How do I go about removing the rows with an empty z element such that only full x,y,z data is left?

CodePudding user response：

You can try this.

import pandas as pd
df = pd.DataFrame([[246.590724  , -24.380943 , '--'],[500.590724  , -24.380943 ,    7.2318]],columns=['x','y','z'])
print(df[df["z"] != '--'])

Output: (Note: sample code has only 2 rows)

            x          y       z
1  500.590724 -24.380943  7.2318

CodePudding user response：

There are a few possibilities in your case. First of all, if the -- is indicating the None values and not literally --. In this case, you can simply use:

import pandas as pd
df = pd.read_csv("data.csv")
df[~df["z"].isna()]

The second possibility is that the -- values are literally --. So, in this case, you can simply use:

import pandas as pd
df = pd.read_csv("data.csv")
df[df["z"] != "--"]

In either case, the result would be the same:

	x	y	z
3	246.823	-24.0873	7.2435
4	245.542	-21.6549	7.1933
5	246.395	-23.4065	7.2318
10	244.747	-23.662	7.2379

CodePudding user response：

You can use dropna from Pandas:

df.dropna(subset=['z'], axis=0, inplace=True)

If the null/NaN values in your file are not really NaN as it seems the case, before using dropna you could load your data like this (assuming "--" corresponds to the pseudo NaN and that the separators are spaces):

df = pd.read_csv("data.csv", sep='\s ', na_values="--")