Here in below code I am trying to do full masking, I acheived full masking but the problem is in below code I am using static "maskvalue" but that needs to be dynamic based on values in cells of particular column.
def masking(filename,columnname,value):
maskvalue = "XXXXXXXXX"
column_dataset1 = pd.read_csv(filename)
print(column_dataset1)
if value == 0:
# mask entire row
maskvalue = "XXXXXXXXX"
column_dataset1[columnname] = column_dataset1[columnname].astype(str).str[:0] maskvalue
# print(column_dataset1)
elif value == '':
column_dataset1[columnname] = column_dataset1[columnname].astype(str).str[:0] maskvalue
print(column_dataset1)
else:
column_dataset1[columnname] = column_dataset1[columnname].astype(str).str[:-value] maskvalue
print(column_dataset1)
masking("path/to/file","phonenumber",0)
For example I am using below data:
sno,Name,Type 1,Type 2,phonenumber
1,Bulbasaur,Grass,Poison,987654321256464684846646464611631646466464
2,Ivysaur,Grass,Poison,98765432121314564645663114646464666432016364
3,Venusaur,Grass,Poison,9876543212
3,VenusaurMega Venusaur,Grass,Poison,9876543212
4,Charmander,Fire,Flying,9876543212
and I am getting output like this:
sno Name Type 1 Type 2 phonenumber
0 1 Bulbasaur Grass Poison XXXXXXXXX
1 2 Ivysaur Grass Poison XXXXXXXXX
2 3 Venusaur Grass Poison XXXXXXXXX
3 3 VenusaurMega Venusaur Grass Poison XXXXXXXXX
4 4 Charmander Fire Flying XXXXXXXXX
Here if I select column as "Type 1" the masking characters should be "XXXXX" since Grass have 5 characters Expected Output:
sno,Name,Type 1,Type 2,phonenumber
1,Bulbasaur,XXXXX,Poison,987654321256464684846646464611631646466464
2,Ivysaur,XXXXX,Poison,98765432121314564645663114646464666432016364
3,Venusaur,XXXXX,Poison,9876543212
3,VenusaurMega Venusaur,XXXXX,Poison,9876543212
4,Charmander,XXXX,Flying,9876543212
Note: Masking should be done on both string and integer columns
CodePudding user response:
How about doing it this way?
column_dataset1[columnname] = ['X'*len(i) for i in column_dataset1[columnname].astype(str)]