I have a data set as df in thich i have
country| indicator|date|year&week| value
as columns name, I want to convert data of only country column to upper case using pyspark (only data not heading) i tried
import pyspark.sql.functions as f
df.select("*", f.upper("country"))
display(df)
but it has error 'NoneType' object has no attribute 'select'
CodePudding user response:
I would have not used select
because select does not change the dataframe it gives a new dataframe with an added column of your resulting function data.
I used withColumn
and it works just fine, please refer to the following code snippet:
import pyspark.sql.functions as f
import pandas as pd
# Sample Data
data = {
"country": ["United States", "Canada", "spain", "germany"],
"indicator": ["1", "2", "3", "4"],
"date": ["2022/01/01", "2021/01/01", "2020/01/01", "2019/01/01"],
"year&week": ["2022-52", "2021-34", "2020-32", "2019-45"],
"value": ["56", "28", "258", "425"]
}
df = pd.DataFrame.from_dict(data)
# Convert to spark dataframe
df = spark.createDataFrame(df)
# Apply your function to the column you choose
df = df.withColumn("country", f.upper(f.col("country")))
Now you can check with df.show()
or display(df)
and you'll get the following output:
df.show()
------------- --------- ---------- --------- -----
| country|indicator| date|year&week|value|
------------- --------- ---------- --------- -----
|UNITED STATES| 1|2022/01/01| 2022-52| 56|
| CANADA| 2|2021/01/01| 2021-34| 28|
| SPAIN| 3|2020/01/01| 2020-32| 258|
| GERMANY| 4|2019/01/01| 2019-45| 425|
------------- --------- ---------- --------- -----
CodePudding user response:
simpleData = [["Canada","Y"],["Spain","N"],["Brazil","Y"], ["Japan","Y"],["India","N"] ]
df = spark.createDataFrame(simpleData,["country","indicator"])
#input
display(df)
import pyspark.sql.functions as f
upperDf=df.withColumn("country", f.upper("country"))
#output
display(upperDf)