Home > Enterprise >  Google Cloud DLP - CSV inspection
Google Cloud DLP - CSV inspection

Time:03-29

I'm trying to inspect a CSV file and there are no findings being returned (I'm using the EMAIL_ADDRESS info type and the addresses I'm using are coming up with positive hits here: https://cloud.google.com/dlp/demo/#!/). I'm sending the CSV file into inspect_content with a byte_item as follows:

byte_item: {
  type: :CSV,
  data: File.open('/xxxxx/dlptest.csv', 'r').read
}

In looking at the supported file types, it looks like CSV/TSV files are inspected via Structured Parsing.

  • For CSV/TSV does that mean one can't just sent in the file, and needs to use the table attribute instead of byte_item as per https://cloud.google.com/dlp/docs/inspecting-structured-text?

  • What about for XSLX files for example? They're an unspecified file type so I tried with a configuration like so, but it still returned no findings:

byte_item: {
  type: :BYTES_TYPE_UNSPECIFIED,
  data: File.open('/xxxxx/dlptest.xlsx', 'rb').read
}

I'm able to do inspection and redaction with images and text fine, but having a bit of a problem with other file types. Any ideas/suggestions welcome! Thanks!

Edit: The contents of the CSV in question:

$ cat ~/Downloads/dlptest.csv 
[email protected],anotehu,[email protected]
blah blah,anoteuh,
aonteuh,

$ file ~/Downloads/dlptest.csv 
~/Downloads/dlptest.csv: ASCII text, with CRLF line terminators

The full request:

parent = "projects/xxxxxxxx/global"
inspect_config = {
  info_types: [{name: "EMAIL_ADDRESS"}],
  min_likelihood: :POSSIBLE,
  limits: { max_findings_per_request: 0 },
  include_quote: true
}
request = {
  parent: parent,
  inspect_config: inspect_config,
  item: {
    byte_item: {
      type: :CSV,
      data: File.open('/xxxxx/dlptest.csv', 'r').read
    }
  }
}
dlp = Google::Cloud::Dlp.dlp_service
response = dlp.inspect_content(request)

CodePudding user response:

xlsx is not yet supported. Coming soon. (Maybe that part of the question should be split out from the CSV debugging issue.)

CodePudding user response:

The CSV file I was testing with was something I created using Google Sheets and exported as a CSV, however, the file showed locally as a "text/plain; charset=us-ascii". I downloaded a CSV off the internet and it had a mime of "text/csv; charset=utf-8". This is the one that worked. So it looks like my issue was specifically due the file being an incorrect mime type.

  • Related