I am doing a POC for CSV data read via pandas
and am a bit perplexed by the below behavior.
Below is my code snippet
import pandas as pd
import random
def get_random_names():
names = []
letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
i = 0
for i in range(0,100):
str_list = random.choices(letters, k=5)
x = ""
for item in str_list:
x = item
names.append(x)
i = 1
return names
def get_random_floats():
float_list = []
for i in range(0,100):
float_list.append(round(random.uniform(10.15, 41.36), 2))
return float_list
data_dict = {'name' : get_random_names(), 'interest' : get_random_floats()}
df = pd.DataFrame(data= data_dict)
print(df.shape)
csv_file = r"""C:\Users\Ronnie\PycharmProjects\pythonProjectTest\pandas\CSV\random.csv"""
df.to_csv(csv_file)
df_csv = pd.read_csv(csv_file)
print(df_csv.shape)
When I print the shape for just the data frame(which I created by doing pd.DataFrame
) the shape is correct (2 columns), but whenever I convert the same data frame to CSV(using df.to_csv
) and get the shape of that, the column becomes 3.
Can someone explain why this is the case?
My CSV snippet is below:
CodePudding user response:
A quick look in the exported csv file should give you the tip.
You export the index as well, but when reading the data this index is used as column.
Use:
df.to_csv(csv_file, index=False)
Alternatively, you can keep the index in the csv but let read_csv
know that this is the index:
df.to_csv(csv_file)
df_csv = pd.read_csv(csv_file, index_col=0)