I'm trying to inspect a CSV file and there are no findings being returned (I'm using the EMAIL_ADDRESS info type and the addresses I'm using are coming up with positive hits here: https://cloud.google.com/dlp/demo/#!/). I'm sending the CSV file into inspect_content
with a byte_item
as follows:
byte_item: {
type: :CSV,
data: File.open('/xxxxx/dlptest.csv', 'r').read
}
In looking at the supported file types, it looks like CSV/TSV files are inspected via Structured Parsing.
For CSV/TSV does that mean one can't just sent in the file, and needs to use the
table
attribute instead ofbyte_item
as per https://cloud.google.com/dlp/docs/inspecting-structured-text?What about for XSLX files for example? They're an unspecified file type so I tried with a configuration like so, but it still returned no findings:
byte_item: {
type: :BYTES_TYPE_UNSPECIFIED,
data: File.open('/xxxxx/dlptest.xlsx', 'rb').read
}
I'm able to do inspection and redaction with images and text fine, but having a bit of a problem with other file types. Any ideas/suggestions welcome! Thanks!
Edit: The contents of the CSV in question:
$ cat ~/Downloads/dlptest.csv
[email protected],anotehu,[email protected]
blah blah,anoteuh,
aonteuh,
$ file ~/Downloads/dlptest.csv
~/Downloads/dlptest.csv: ASCII text, with CRLF line terminators
The full request:
parent = "projects/xxxxxxxx/global"
inspect_config = {
info_types: [{name: "EMAIL_ADDRESS"}],
min_likelihood: :POSSIBLE,
limits: { max_findings_per_request: 0 },
include_quote: true
}
request = {
parent: parent,
inspect_config: inspect_config,
item: {
byte_item: {
type: :CSV,
data: File.open('/xxxxx/dlptest.csv', 'r').read
}
}
}
dlp = Google::Cloud::Dlp.dlp_service
response = dlp.inspect_content(request)
CodePudding user response:
xlsx is not yet supported. Coming soon. (Maybe that part of the question should be split out from the CSV debugging issue.)
CodePudding user response:
The CSV file I was testing with was something I created using Google Sheets and exported as a CSV, however, the file showed locally as a "text/plain; charset=us-ascii". I downloaded a CSV off the internet and it had a mime of "text/csv; charset=utf-8". This is the one that worked. So it looks like my issue was specifically due the file being an incorrect mime type.