i'm reading csv file with pyspark like this : df = spark.read.format('csv').options(header=True, encoding='windows-1251',delimiter=';').load('csv_file.csv')
in the result in columns i got string with " ' " single quote character, like this 12435'
there is not a single line in the file with a quote at the end, idk where spark finds it
i need to remove this quote
btw pandas read csv withot quote at the end of every row, but i cant translate pd.DF to spark.DF, i got error cannot merge type DoubleType and StringType
DF has some empty cols
i tried:
from pyspark.sql.functions import *
for i in df.columns:
df.withColumn(i, expr("substring({name}, 1, length({name}) -1)".format(name=i)))
for i in df.columns:
df.withColumn(i, col(i).substr(lit(0), length(col(i)) - 1))
none of this helped me
ty
read df
col1 | col2
12345' abcde'
expected output
col1 | col2
12345 abcde
CodePudding user response:
Use list comprehension
df.select(*[regexp_replace(F.col(c),"'",'').alias(c) for c in df.columns]).show()
----- -----
| col1| col2|
----- -----
|12345|abcde|
----- -----