I am working on a python script that read data from a database and save this data into a .csv
file.
In order to save it correctly I need to escape different characters such as \r\n
or \n
.
Here is how I am currently doing it:
Firstly, I use the read_sql
pandas function in order to read the data from the database.
import pandas as pd
df = pd.read_sql(
sql = 'SELECT * FROM exampleTable',
con = SQLAlchemyConnection
)
The table I get has different types of values.
Then, the script updates the dataframe obtained changing every string value to raw string. In order to achive that I use two nested for loops in order to operate with every single value.
def update_df(df)
for rowIndex, row in df.iterrows():
for colIndex, values in row.items():
if isinstance(df[rowIndex, colIndex], str):
df.at[rowIndex, colIndex] = repr(df.at[rowIndex, colIndex])
return df
However, the amount of data I need to elaborate is large (more than 1 million rows with more than 100 columns) and it takes hours.
What I need is a way to create the csv file in a faster way.
Thank you in advance.
CodePudding user response:
It should be faster to use applymap
if really you have mixed types:
df = df.applymap(lambda x: repr(x) if isinstance(x, str) else x)
However, if you can identify string columns, then you can slice them, (maybe in combination with re.escape
?).:
import re
str_cols = ['col1', 'col2']
df[str_cols] = df[str_cols].applymap(re.escape)