Home > database >  About sparksql used in the where, group by, having problems
About sparksql used in the where, group by, having problems

Time:09-28

In spark1.6
Is the following data:
Df=spark createDataFrame ([(1, 2, 3, 4), 5-tetrafluorobenzoic (1), (4, 7), (1, 5-tetrafluorobenzoic), 5-tetrafluorobenzoic (2), (2,2,4,5)], [" aa ", "bb", "cc", "dd"])
Df. RegisterTempTable (" record ")

Aa bb cc dd
0 1 2 3 4
1 1, 3, 4, 5
2, 4, 5 6 7
3 1, 3, 4, 5
4, 2, 3, 4, 5
5, 2 2, 4, 5

To aa this column, the records of count is greater than 1 return, return the result should be as follows:

Aa bb cc dd
0 1 2 3 4
1 1, 3, 4, 5
1, 2, 3, 4, 5
3, 2, 3, 4, 5
4 2 2, 4, 5

In mysql can perform the following statement:
Select * from record where aa in (select aa from record group by aa having the COUNT (*) & gt; 1)
And be able to return to meet the demand,

Now in the spark, the execution, an error:

Py4JJavaError: An error occurred while calling o30. SQL.
: Java. Lang. RuntimeException: [1.42] failure: ` `) 'expected but identifier aa found

select * from record where aa in (select aa from record group by aa having the COUNT (*) & gt; 1)
^
At the scala. Sys. Package $. Error (package. Scala: 27)
The at org. Apache. Spark. SQL. Catalyst. AbstractSparkSQLParser. Parse (AbstractSparkSQLParser. Scala: 36)
The at org. Apache. Spark. SQL. Catalyst. DefaultParserDialect. Parse (ParserDialect. Scala: 67)
The at org. Apache. Spark. SQL. SQLContext $$anonfun $2. Apply (SQLContext. Scala: 211)
The at org. Apache. Spark. SQL. SQLContext $$anonfun $2. Apply (SQLContext. Scala: 211)
The at org. Apache. Spark. SQL. Execution. SparkSQLParser $$$$apache org anonfun $spark $$SQL execution $SparkSQLParser $$$1. Others apply (SparkSQLParser. Scala: 114)
The at org. Apache. Spark. SQL. Execution. SparkSQLParser $$$$apache org anonfun $spark $$SQL execution $SparkSQLParser $$$1. Others apply (SparkSQLParser. Scala: 113)
At scala.util.parsing.com binator. Parsers $Success. The map (136) Parsers. Scala:
At scala.util.parsing.com binator. Parsers $Success. The map (135) Parsers. Scala:
At scala.util.parsing.com binator. Parsers $Parser $$anonfun $map $1. Apply (242) Parsers. Scala:
At scala.util.parsing.com binator. Parsers $Parser $$anonfun $map $1. Apply (242) Parsers. Scala:
At scala.util.parsing.com binator. Parsers $$$3. -anon apply (222) Parsers. Scala:
At scala.util.parsing.com binator. Parsers $Parser $$$append anonfun $1 $$anonfun $$2. Apply the apply (254) Parsers. Scala:
At scala.util.parsing.com binator. Parsers $Parser $$$append anonfun $1 $$anonfun $$2. Apply the apply (254) Parsers. Scala:
At scala.util.parsing.com binator. Parsers $Failure. Append (202) Parsers. Scala:
At scala.util.parsing.com binator. Parsers $Parser $$$append anonfun $1. Apply (254) Parsers. Scala:
At scala.util.parsing.com binator. Parsers $Parser $$$append anonfun $1. Apply (254) Parsers. Scala:
At scala.util.parsing.com binator. Parsers $$$3. -anon apply (222) Parsers. Scala:
At scala.util.parsing.com binator. Parsers $$-anon $2 $$anonfun $$14. Apply the apply (891) Parsers. Scala:
At scala.util.parsing.com binator. Parsers $$-anon $2 $$anonfun $$14. Apply the apply (891) Parsers. Scala:
At the scala. Util. DynamicVariable. WithValue (DynamicVariable. Scala: 57)
At scala.util.parsing.com binator. Parsers $$$2. -anon apply (890) Parsers. Scala:
At scala.util.parsing.com binator. PackratParsers $$$1. -anon apply (PackratParsers. Scala: 110)
The at org. Apache. Spark. SQL. Catalyst. AbstractSparkSQLParser. Parse (AbstractSparkSQLParser. Scala: 34)
The at org. Apache. Spark. SQL. SQLContext $$anonfun $1. Apply (SQLContext. Scala: 208)
The at org. Apache. Spark. SQL. SQLContext $$anonfun $1. Apply (SQLContext. Scala: 208)
The at org. Apache. Spark. SQL. Execution. Datasources. DDLParser. Parse (43) DDLParser. Scala:
The at org. Apache. Spark. SQL. SQLContext. ParseSql (SQLContext. Scala: 231)
The at org. Apache. Spark. SQL. SQLContext. SQL (SQLContext. Scala: 817)
At sun. Reflect. GeneratedMethodAccessor36. Invoke (Unknown Source)
At sun. Reflect. DelegatingMethodAccessorImpl. Invoke (43) DelegatingMethodAccessorImpl. Java:
The at Java. Lang. Reflect. Method. Invoke (497) Method. The Java:
The at py4j. Reflection. MethodInvoker. Invoke (MethodInvoker. Java: 231)
The at py4j. Reflection. ReflectionEngine. Invoke (ReflectionEngine. Java: 381)
At py4j. Gateway. Invoke (259) Gateway. Java:
At py4j.com mands. AbstractCommand. InvokeMethod (AbstractCommand. Java: 133)
At py4j.com mands. CallCommand. Execute (CallCommand. Java: 79)
The at py4j. GatewayConnection. Run (GatewayConnection. Java: 209)
The at Java. Lang. Thread. The run (Thread. Java: 745)

Excuse me, how to solve?

CodePudding user response:

The spark SQL is not MySQL

CodePudding user response:

Firing a query to pull away, conversion between dataframe, don't use a subquery