I have a data frame with content
scala> true_nomar.show(1)
-------- -------------- -------------------- ------ ------ --------------------
|category|topicUpPredict| topic|ciTrue|upTrue| normal|
-------- -------------- -------------------- ------ ------ --------------------
|the_thao| the_thao|[the_thao, the_gioi]| true| true| Khi các mục sư m...|
-------- -------------- -------------------- ------ ------ --------------------
only showing top 1 row
but when i show all, the content of column normal is not full text, another columns has no content
scala> true_nomar.show(1,false)
-------- -------------- -------------------- ------ ------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|category|topicUpPredict|topic |ciTrue|upTrue|normal |
-------- -------------- -------------------- ------ ------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Thích thú trước hai vị học trò đặc biệt này, ông Eriksson nói: "Bóng đá cần nhiều người như là hai vị mục sư Charles và Tim để tạo cho trẻ em thật nhiều cơ hội đến với bóng đá”. Thậm chí Geoff Hurst, cựu ngôi sa|ổi lại, hai mục sư Crosland và Smith cùng các con chiên sẽ cầu nguyện cho đội tuyển Anh trong VCK World Cup 2006 mà trước mắt là cầu nguyện cho chấn thương của tiền đạo Michael Owen sớm hồi phục.
-------- -------------- -------------------- ------ ------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
only showing top 1 row
CodePudding user response:
This is most likely due to one or more carriage return (CR) symbols (\r
in Scala string literals) embedded somewhere in the text. When a CR is encountered, the terminal moves the caret to the beginning of the line, which messes up the output:
scala> "123\r456"
4560: String = 123
Here, the output should be res0: String = 123...
, but the caret position gets reset after 123
and 456
overwrites res
. The same happens when a dataframe is printed:
scala> Seq(("baz", "foofoofoo\rbarbar")).toDF("cat", "normal").show(false)
--- ----------------
|cat|normal |
--- ----------------
barbar|ofoofoo
--- ----------------
If you look closer at your output, you'll find the closing |
, so it is full text, just garbled:
--------------------
--------------------
cựu ngôi sa|ổi lại,
--------------------
^
^
end of "normal" column
Use regexp_replace($"normal", "\r", "\\\\r")
to replace all CRs with the escaped representation \r
:
scala> val df = Seq(("baz", "foofoofoo\rbarbar")).toDF("cat", "normal")
df: org.apache.spark.sql.DataFrame = [cat: string, normal: string]
scala> df.show(false)
--- ----------------
|cat|normal |
--- ----------------
barbar|ofoofoo
--- ----------------
scala> df.withColumn("normal", regexp_replace($"normal", "\r", "\\\\r")).show(false)
--- -----------------
|cat|normal |
--- -----------------
|baz|foofoofoo\rbarbar|
--- -----------------