The from pyspark. SQL import SQLContext
Sc=SparkContext (" local ", "Simple App")
SqlContext=sqlContext (sc)
Url=\
"JDBC: mysql://localhost: 3306/stock_data? User=root& Password=test "
Df=sqlContext \
. Read \
. The format (" JDBC ") \
Option (" url ", url) \
Option (" dbtable ", "stock_detail_collect") \
The load ()
Df. PrintSchema ()
Counts.=df groupBy (" stock_id "). The count ()
Counts. The show ()
===========
How data table only 1153194 records, how to run the above code memory leaks:
16/02/05 23:30:28 WARN TaskMemoryManager: 8.3 MB memory leak from org. Apache. The spark. Unsafe. Map. BytesToBytesMap @ 431395 b1
16/02/05 23:30:28 ERROR Executor, Managed the memory leak detected; Size=8650752 bytes, TID=1
Environment: the spark - 1.6.0 - bin - hadoop2.6
Ubuntu 14.04.3 LTS
Jdk1.8.0 _66
I don't know where the problem? How to break, thank you very much
CodePudding user response:
Counts.=df groupBy (" stock_id "). The count ()Counts. The show ()
To write to the file:
Df. RegisterTempTable (" people ")
Count=sqlContext. SQL (" select stock_id, count (*) as c from people group by stock_id order by stock_id ")
For the name in the count. Collect () :
File_output. Write (STR (name))
File_output. Flush ()
File_output. Close ()