I have a dataset that includes full x & y columns but some z columns don't have any data (shown below)
x y z
246.590724 -24.380943 --
246.593273 -24.381374 --
246.7051392 -24.33428028 --
246.823241 -24.087254 7.2435
245.5418243 -21.65491486 7.1933
246.39548 -23.40651445 7.2318
246.605133 -24.390006 --
246.67683 -24.440527 --
247.3474986 -24.2324699 --
246.80526 -24.692698 --
244.7469482 -23.66201019 7.2379
How do I go about removing the rows with an empty z element such that only full x,y,z data is left?
CodePudding user response:
You can try this.
import pandas as pd
df = pd.DataFrame([[246.590724 , -24.380943 , '--'],[500.590724 , -24.380943 , 7.2318]],columns=['x','y','z'])
print(df[df["z"] != '--'])
Output: (Note: sample code has only 2 rows)
x y z
1 500.590724 -24.380943 7.2318
CodePudding user response:
There are a few possibilities in your case. First of all, if the --
is indicating the None values and not literally --
. In this case, you can simply use:
import pandas as pd
df = pd.read_csv("data.csv")
df[~df["z"].isna()]
The second possibility is that the --
values are literally --
. So, in this case, you can simply use:
import pandas as pd
df = pd.read_csv("data.csv")
df[df["z"] != "--"]
In either case, the result would be the same:
x | y | z | |
---|---|---|---|
3 | 246.823 | -24.0873 | 7.2435 |
4 | 245.542 | -21.6549 | 7.1933 |
5 | 246.395 | -23.4065 | 7.2318 |
10 | 244.747 | -23.662 | 7.2379 |
CodePudding user response:
You can use dropna from Pandas:
df.dropna(subset=['z'], axis=0, inplace=True)
If the null/NaN values in your file are not really NaN as it seems the case, before using dropna you could load your data like this (assuming "--" corresponds to the pseudo NaN and that the separators are spaces):
df = pd.read_csv("data.csv", sep='\s ', na_values="--")