I have:
Customerkeycode
B01:B14:110083
I want:
PlanningCustomerSuperGroupCode, DPGCode, APGCode
BO1, B14, 110083
CodePudding user response:
import pandas as pd
df = pd.DataFrame(
{
"Customerkeycode": [
"B01:B14:110083",
"B02:B15:110084"
]
}
)
df['PlanningCustomerSuperGroupCode'] = df['Customerkeycode'].apply(lambda x: x.split(":")[0])
df['DPGCode'] = df['Customerkeycode'].apply(lambda x: x.split(":")[1])
df['APGCode'] = df['Customerkeycode'].apply(lambda x: x.split(":")[2])
df_rep = df.drop("Customerkeycode", axis = 1)
print(df_rep)
PlanningCustomerSuperGroupCode DPGCode APGCode
0 B01 B14 110083
1 B02 B15 110084
CodePudding user response:
In pyspark, first split
the string into an array, and then use the getItem
method to split it into multiple columns.
import pyspark.sql.functions as F
...
cols = ['PlanningCustomerSuperGroupCode', 'DPGCode', 'APGCode']
arr_cols = [F.split('Customerkeycode', ':').getItem(i).alias(cols[i]) for i in range(3)]
df = df.select(*arr_cols)
df.show(truncate=False)
CodePudding user response:
split into 3 columns by the ':' with column names ['PlanningCustomerSuperGroupCode', 'DPGCode', 'APGCode']
import pyspark.sql.functions as F
df.withColumn('PlanningCustomerSuperGroupCode', F.split(F.col('Customerkeycode'), ':')[0]) \
.withColumn('DPGCode', F.split(F.col('Customerkeycode'), ':')[1]) \
.withColumn('APGCode', F.split(F.col('Customerkeycode'), ':')[2]) \
.drop('Customerkeycode') \
.show()