When I concat two dataframes which are both 337 columns and then export to CSV, the result become 338 columns with each time a new unnamed index being added.
df1
Out[141]:
datecreated 1 2 3 4 5 ... 331 332 333 334 335 336
0 2022-11-14 4000 3900 3850 3810 3790 ... 5520 5300 5180 4990 4730 4520
1 2022-11-15 4000 3900 3850 3810 3790 ... 5520 5300 5180 4990 4730 4520
[2 rows x 337 columns]
df4
Out[142]:
datecreated 1 2 3 ... 333 334 335 336
0 2022-11-16 4080.0 3980.0 3940.0 ... 5510.0 5290.0 4960.0 4700.0
[1 rows x 337 columns]
using the concatenation:
df5 = pd.concat([df1, df4], ignore_index=True)
and then exporting to CSV:
csv_buffer = StringIO()
df5.to_csv(csv_buffer)
file_name = 'outputs.csv'
s3_resource.Object(bucket_name, file_name).put(Body=csv_buffer.getvalue())
yields a 338 column with unnamed index after fetching the updated output file:
body = s3_client.get_object(Bucket=bucket_name, Key='outputs.csv')['Body']
contents = body.read().decode('utf-8')
df1 = pd.read_csv(StringIO(contents), parse_dates=['datecreated'])
df1
Out[134]:
Unnamed: 0 datecreated 1 2 ... 333 334 335 336
0 0 2022-11-14 4000.0 3900.0 ... 5180.0 4990.0 4730.0 4520.0
1 1 2022-11-15 4000.0 3900.0 ... 5180.0 4990.0 4730.0 4520.0
2 2 2022-11-16 4080.0 3980.0 ... 5510.0 5290.0 4960.0 4700.0
What is causing this?
CodePudding user response:
The unnamed index is the row index of the dataframe. If you do not want this, you can use index=False as one of the arguments such that :
d5.to_csv(csv_buffer,index=False)