I've got a big table (~500m rows) in mysql RDS and I need to export specific columns from it to csv, to enable import into questDb.
Normally I'd use into outfile
but this isn't supported on RDS as there is no access to the file system.
I've tried using workbench to do the export but due to size of the table, I keep getting out-of-memory issues.
CodePudding user response:
Finally figured it out with help from this: Exporting a table from Amazon RDS into a CSV file
This solution works well as long as you have a sequential column of some kind, e.g. an auto incrementing integer PK or a date column. Make sure you have your date column indexed if you have a lot of data!
#!bin/bash
# Maximum number of rows to export/total rows in table, set a bit higher if live data being written
MAX=500000000
# Size of each export batch
STEP=1000000
mkdir -p parts
for (( c=0; c<= $MAX; c = c $STEP ))
do
mysql --port 3306 --protocol=TCP -h <rdshostname> -u <username> -p<password> --quick --database=<db> -e "select column1, column2, column3 <table> order by <timestamp> ASC limit $STEP offset $c" | sed 's/\t/","/g;s/^/"/;s/$/"/;s/\n//g' > export$c.csv
# split down in to chunks under questdbs 65k line limit
split -d -l 64999 --additional-suffix=.csv $FILE_NAME.csv ./parts/$FILE_NAME
done
# print out import statements to a file
for i in $(ls ./parts); do echo "COPY reading from '$i';" >> import.sql; done;
A slightly different approach which may be faster depending on indexing you have in place is step through the data month by month:
#!bin/bash
START_YEAR=2020
END_YEAR=2022
mkdir -p parts
for (( YEAR=$START_YEAR; YEAR<=$END_YEAR; YEAR ))
do
for (( MONTH=1; MONTH<=12; MONTH ))
do
NEXT_MONTH=1
let NEXT_YEAR=$YEAR 1
if [ $MONTH -lt 12 ]
then
let NEXT_MONTH=$MONTH 1
NEXT_YEAR=$YEAR
fi
FILE_NAME="export-$YEAR-$MONTH-to-$NEXT_YEAR-$NEXT_MONTH"
mysql --port 3306 --protocol=TCP -h <rdshost> -u app -p<password> --quick --database=<database> -e "select <column1>, <column2>, round(UNIX_TIMESTAMP(<dateColumn>)) * 1000000 as date from <table> where <table>.<dateColumn> >= '$YEAR-$MONTH-01 00:00:00' and table.<dateColumn> < '$NEXT_YEAR-$NEXT_MONTH-01 00:00:00' order by <table>.<dateColumn> ASC" | sed 's/\t/","/g;s/^/"/;s/$/"/;s/\n//g' > $FILE_NAME.csv
# split down in to chunks under questdbs 65k line limit
split -d -l 64999 --additional-suffix=.csv $FILE_NAME.csv ./parts/$FILE_NAME
done
done
# print out import statements to a file
for i in $(ls ./parts); do echo "COPY reading from '$i';" >> import.sql; done;
The above scripts will output a import.sql
containing all the sql statements you need to import your data. See: https://questdb.io/docs/guides/importing-data/
CodePudding user response:
Edit: this solution would work only if exporting the whole table, not when exporting specific columns
You could try using mysqldump with extra params for CSV conversion. AWS documents how to use mysqldump with RDS and you can see at this stackoverflow question how to use extra params to convert into CSV.
I am quoting here the relevant part from that last link (since there are a lot of answers and comments)
mysqldump <DBNAME> <TABLENAME> --fields-terminated-by ',' \
--fields-enclosed-by '"' --fields-escaped-by '\' \
--no-create-info --tab /var/lib/mysql-files/