I have a csv file that has 73 rows of data and 16 columns and I want to read it and pass it to a pandas dataframe but when I do
data_dataframe = pd.read_csv(csv_file, sep = ',')
I get 3152 rows and 22 columns with 73 rows and 16 columns of data and the rest just pure NaN values. How can I tell pandas to read the valid rows and columns data and avoid all these extra NaN ones?
CodePudding user response:
First, take a visualization of the whole data
import seaborn as sn
sn.heatmap(data_dataframe.isna())
and then if u want to remove rows that have "nan" use
data_dataframe.dropna()
and if u want to remove continuous rows use (by default axis = 0 so no need to specify it)
data_dataframe.drop(index = data_dataframe.index[1:3], inplace = True)
and if u want to remove any specific rows use
data_dataframe.drop(index = [1,3,5], inplace = True)
CodePudding user response:
There is a simple function for it:
given a data frame df
, use the following df. dropna()
function.