I am trying to fetch data from a database, put them in a dataframe and load them to AWS S3.
For the fields that contains None values, the values get loaded to S3 as None. I would like to have None in VARCHAR fields to be NULL or Blank and None in INT fields to be 0 or blank.
connection = pyodbc.connect(conn)
sql = 'SELECT id, name from table_a'
df = pd.read_sql_query(sql, connection)
df=df.applymap(str)
csv_buffer = BytesIO()
s3 = boto3.resource('s3')
with gzip.GzipFile(mode='w', fileobj=csv_buffer) as zipped_file:
df.to_csv(TextIOWrapper(zipped_file, 'utf8'), index=False)
s3.Object(bucket[env],dest_filename).put(Body=csv_buffer.getvalue())
The output that i am getting while print(df) is -
ID | NAME |
---|---|
1 | None |
None | John |
I would want the dataframe to be loaded to S3 as below -
ID | NAME |
---|---|
1 | |
John |
How can i handle this situation?
CodePudding user response:
To simply change the value of None to an empty value in python, you can try:
for col in df.columns:
df[col] = df[col].fillna(str())
int() can be substituted for str() as well, or empty quotes: ''