Home > Net >  listing multiple converters during initial reading of file, possible header issue?
listing multiple converters during initial reading of file, possible header issue?

Time:11-29

I am reading in a CSV file to calculate some stats through Python.

I know that I can use the converters at the start of the program to adjust for some of the potential data issues, but for some reason when I try to do that, it errors with inflated results.

It's a 20-column CSV with over 1000 rows of data. Public domain datalink is here: Dataanime.csv calculation results

Obviously, this is no good either. How can I set either the header to bypass the word 'Episodes' and do the calculations, or how do I rewrite the df = pd.read_csv (r'dataanime.csv', encoding='utf-8', header=None, skiprows=1, converters = {2 : lambda s: float(s.replace('Episodes','').join(s.replace('-','0')))}) to correct for this?

CodePudding user response:

It's easier to read CSV files by letting pandas figure out how to handle the headers. By not passing anything into header and skiprows, Pandas will infer that the first line in the CSV is the header line and name your columns appropriately. To deal with the "-" Episode values, you can set na_values to indicate that "-" in that column is a NaN value, and use dropna() to remove those rows when calculating statistics.

df = pd.read_csv("dataanime.csv", encoding="utf-8", na_values={"Episodes": "-"})

# calculate stats on the Episodes columns
episode_values = df["Episodes"].dropna()
mean1 = episode_values.mean()
sum1 = episode_values.sum()
...
  • Related