Home > other >  Spark processing 1153194 records of mysql table, how is the memory leak?
Spark processing 1153194 records of mysql table, how is the memory leak?

Time:09-27

The from pyspark import SparkContext, SparkConf
The from pyspark. SQL import SQLContext

Sc=SparkContext (" local ", "Simple App")

SqlContext=sqlContext (sc)

Url=\
"JDBC: mysql://localhost: 3306/stock_data? User=root& Password=test "

Df=sqlContext \
. Read \
. The format (" JDBC ") \
Option (" url ", url) \
Option (" dbtable ", "stock_detail_collect") \
The load ()

Df. PrintSchema ()

Counts.=df groupBy (" stock_id "). The count ()
Counts. The show ()

===========
How data table only 1153194 records, how to run the above code memory leaks:
16/02/05 23:30:28 WARN TaskMemoryManager: 8.3 MB memory leak from org. Apache. The spark. Unsafe. Map. BytesToBytesMap @ 431395 b1
16/02/05 23:30:28 ERROR Executor, Managed the memory leak detected; Size=8650752 bytes, TID=1
Environment: the spark - 1.6.0 - bin - hadoop2.6
Ubuntu 14.04.3 LTS
Jdk1.8.0 _66
I don't know where the problem? How to break, thank you very much

CodePudding user response:

Counts.=df groupBy (" stock_id "). The count ()
Counts. The show ()
To write to the file:
Df. RegisterTempTable (" people ")

Count=sqlContext. SQL (" select stock_id, count (*) as c from people group by stock_id order by stock_id ")

For the name in the count. Collect () :
File_output. Write (STR (name))
File_output. Flush ()
File_output. Close ()
  • Related