Home > Software engineering >  df.shape returns different column length for same dataframe in csv
df.shape returns different column length for same dataframe in csv

Time:08-14

I am doing a POC for CSV data read via pandas and am a bit perplexed by the below behavior.

Below is my code snippet

import pandas as pd
import random

def get_random_names():
    names = []
    letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    i = 0
    for i in range(0,100):
        str_list = random.choices(letters, k=5)
        x = ""
        for item in str_list:
            x  = item
        names.append(x)
        i  = 1
    return names

def get_random_floats():
    float_list = []
    for i in range(0,100):
        float_list.append(round(random.uniform(10.15, 41.36), 2))
    return float_list

data_dict = {'name' : get_random_names(), 'interest' : get_random_floats()}
df = pd.DataFrame(data= data_dict)
print(df.shape)
csv_file = r"""C:\Users\Ronnie\PycharmProjects\pythonProjectTest\pandas\CSV\random.csv"""
df.to_csv(csv_file)
df_csv = pd.read_csv(csv_file)
print(df_csv.shape)

When I print the shape for just the data frame(which I created by doing pd.DataFrame) the shape is correct (2 columns), but whenever I convert the same data frame to CSV(using df.to_csv) and get the shape of that, the column becomes 3.

Can someone explain why this is the case?

My CSV snippet is below:

enter image description here

CodePudding user response:

A quick look in the exported csv file should give you the tip.

You export the index as well, but when reading the data this index is used as column.

Use:

df.to_csv(csv_file, index=False)

Alternatively, you can keep the index in the csv but let read_csv know that this is the index:

df.to_csv(csv_file)
df_csv = pd.read_csv(csv_file, index_col=0)
  • Related