Home > OS >  Ruby Regex on Active Directory String
Ruby Regex on Active Directory String

Time:07-01

I have a string that represents multiple DNs for Active Directory but has been separated by commas instead of ;

The String:

CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal,
CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal,
CN=Operators,ou=App2,ou=groups,dc=pkldap,dc=internal

I am trying to write a regex that will match on both ou=App1 and not the ou=App2 but then also make the , after dc=internal become a ;

Is this possible?

The result would be:

CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal;
CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal;

CodePudding user response:

Using #strip and #sub to Clean Up Your LDIF Data

Really, the "correct" answer would be to get valid LDIF in the first place, and then parse it as such with a gem like Net::LDAP. However, the changes you want to your existing file are fairly trivial. For example, we'll start by assigning the String data from your question to a variable named ldif using a here-document literal:

ldif = <<~'LDIF'
  CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal,
  CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal,
  CN=Operators,ou=App2,ou=groups,dc=pkldap,dc=internal
LDIF

You can now modify and match the lines from the String that you want with String#each_line to iterate, and String#gsub and a Regexp lookahead assertion to find and collect the lines you want using Array#select on the output from #each_line, and storing the results into a matching_apps Array.

This all sounds much more complicated than it is. Consider the following method chain, which is really just a one-liner wrapped for readability:

matching_apps =
  ldif.each_line.select { _1.match? /ou=App1(?=[,;]?$?)/ }
    .map { _1.strip.sub /[,;]$/, ";" }

#=> 
["CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal;",                          
 "CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal;"]  

The use of String#strip and String#sub will help to ensure that all lines are normalized the way you want, including the trailing semicolons. However, this is likely to cause problems in subsequent steps, so I'd probably recommend removing those trailing semicolons as well.


Note: You can stop reading here if you just want to solve your immediate question as originally posted. The rest of the answer covers additional considerations related to data normalization, and provides some examples on how and why you might want to strip the semicolons as well.


Why and How to Normalize without Semicolons

You can replace the final substitution from #sub with an empty String (e.g. "") to remove the trailing semicolons (if present). Normalizing without the semicolons now may save you the trouble of having to clean up those lines again later when you iterate over the Array of results stored in matching_apps from Array#select.

For example, if you need to rejoin lines with commas, interpolate the lines within other String objects in subsequent steps, or do anything where those stored semicolons may be an unexpected surprise it's better to deal with it sooner rather than later. If you really need the trailing semicolons, it's very easy to use String#concat or other forms of String interpolation to add them back, but having unexpected characters in a String can be a source of unexpected bugs that are best avoided unless you're sure you'll always need that semicolon at the end.

Example 1: Output Where Semicolons Might be Unexpected

For example, suppose you want to use the results to format output for a command-line client where a trailing semicolon wouldn't be expected. The following works nicely because the semicolons are already stripped:

matching_apps =
  ldif.each_line.select { _1.match? /ou=App1(?=[,;]?$?)/ }
    .map { _1.strip.sub /[,;]$/, "" }

printf "Make the following calls:\n\n"
matching_apps.each_with_index do |dn, idx|
  puts %(#{idx.succ}. ldapsearch -D '#{dn}' [opts])
end

This would print out:

Make the following calls:

1. ldapsearch -D 'CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal' [opts]
2. ldapsearch -D 'CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal' [opts]

without having to first strip any trailing semicolons that might not work with the printed command, tool, or other output.

Examples of Rejoining with Commas and Semicolons

On the other hand, you can just as easily rejoin the Array elements with a comma or semicolon if you want. Consider the following two examples:

matching_apps.join ", "
#=> "CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal, CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal"  
p format("(%s)", matching_apps.join("; "))
#=> "(CN=Admins,ou=App1,ou=groups,dc=pkldap,dc=internal; CN=Auditors,ou=App1,ou=groups,dc=pkldap,dc=internal)"  

Keep Flexibility in Mind

If the String objects in your Array still had the trailing semicolons, you'd have to do something about them. So, unless you already know what you plan to do with each String, and whether or not the semicolons will be needed, it's probably best to keep them out of matching_apps in the first place to optimize for flexibility. That's just an opinion, to be sure, but definitely one worth considering.

  • Related