I have a data frame with columns as customer_id, postal_code, gender, age..I want to display the data only where postal_code is with alphabets. Please help me
CodePudding user response:
You can use rlike , if you are fimilar with SQL, its similar to LIKE clause
For you use case you can use Rlike - Alphabets Only
Data Preparation
input_str = """
Alicja London 44
Alicja Germany Berlin
Alicja India 91
Piotr Arizona USA
ABC India P4G32L
""".split()
input_values = list(map(lambda x: x.strip() if x.strip() != 'null' else None, input_str))
cols = list(map(lambda x: x.strip() if x.strip() != 'null' else None, "name,country,postal_code".split(',')))
n = len(input_values)
n_cols = 3
input_list = [tuple(input_values[i:i n_cols]) for i in range(0,n,n_cols)]
sparkDF = sql.createDataFrame(input_list, cols)
sparkDF.show()
------ ------- -----------
| name|country|postal_code|
------ ------- -----------
|Alicja| London| 44|
|Alicja|Germany| Berlin|
|Alicja| India| 91|
| Piotr|Arizona| Usa|
| ABC| India| P4G32L|
------ ------- -----------
RLike - AlphaNumeric (Alpha or Numeric)
sparkDF.filter(F.col("postal_code").rlike("^[a-zA-Z0-9]*$")).show()
------ ------- -----------
| name|country|postal_code|
------ ------- -----------
|Alicja| London| 44|
|Alicja|Germany| Berlin|
|Alicja| India| 91|
| Piotr|Arizona| Usa|
| ABC| India| P4G32L|
------ ------- -----------
RLike - Alphabets
sparkDF.filter(F.col("postal_code").rlike("^[A-Z][a-zA-Z]. ")).show()
------ ------- -----------
| name|country|postal_code|
------ ------- -----------
|Alicja|Germany| Berlin|
| Piotr|Arizona| Usa|
------ ------- -----------
RLike - Numeric
sparkDF.filter(F.col("postal_code").rlike("^[0-9]*$")).show()
------ ------- -----------
| name|country|postal_code|
------ ------- -----------
|Alicja| London| 44|
|Alicja| India| 91|
------ ------- -----------