Home > database >  Adding 10 headers to a Pyspark Dataframe
Adding 10 headers to a Pyspark Dataframe

Time:05-01

I have a csv file that does not have headers, and it consists of 49 columns. I was given a separate csv file with columns' description and column name. Instead of adding StructField 49 times (like StructField("srcip",StringType(),True)), is there another way to do it? Like a function?

Thank you.

CodePudding user response:

Assuming you have a list of column names (by reading from csv etc), you can loop through it and create a proper schema

cols = ['a', 'b', 'c']

schema = T.StructType([T.StructField(c, T.StringType()) for c in cols])
# StructType(List(StructField(a,StringType,true),StructField(b,StringType,true),StructField(c,StringType,true)))

  • Related