File content of file test
:
costCenter: LL63238012
mail: [email protected]
LLpResponsible: cn=LLf58420,ou=Personal,ou=People,ou=LLDI,o=LLP
LLpHomeDirectory: nisMapName=auto.home,ou=RDC_AMEC_LL-CDC01,ou=Locations,ou=LLDI,o=LLP#0#Quality=economy,NisMap=llc1002:/proj/llc1002_ziz1/q,Quota=10621
LLpHomeDirectory: nisMapName=auto.home,ou=RDC_AMEC_LL-CDC01,ou=Locations,ou=LLDI,o=LLP#0#Quality=scratKG,NisMap=llc1002:/proj/llc1002_ziz1_scratKG/q,Quota=12000,Id=scratKG
fullName: Tulip project ziz1
costCenter: MX61FRK604
mail: [email protected]
LLpResponsible: cn=LLa11826,ou=Personal,ou=People,ou=LLDI,o=LLP
LLpHomeDirectory: nisMapName=auto.home,ou=SAT_AMEC_MX-GDL01,ou=Locations,ou=LLDI,o=LLP#0#Quality=reference,NisMap=llc0156:/proj/llc0156_zmx28home_3/q,Quota=100,Id=3
LLpHomeDirectory: nisMapName=auto.home,ou=SAT_AMEC_MX-GDL01,ou=Locations,ou=LLDI,o=LLP#0#Quality=reference,NisMap=llc0156:/proj/llc0156_zmx28home/q,Quota=300
LLpHomeDirectory: nisMapName=auto.home,ou=SAT_AMEC_MX-GDL01,ou=Locations,ou=LLDI,o=LLP#0#Quality=reference,NisMap=llc0156:/proj/llc0156_zmx28home_2/q,Quota=100,Id=2
fullName: xFSL to LLDI migration
costCenter: RU61FPD561
mail: [email protected]
LLpResponsible: cn=LLa09278,ou=Personal,ou=People,ou=LLDI,o=LLP
LLpHomeDirectory: nisMapName=auto.home,ou=RDC_AMEC_LL-CDC01,ou=Locations,ou=LLDI,o=LLP#0#Quota=1,Quality=EconomyHP,NisMap=llc2002:/proj/llc2002_zru12/q
LLpHomeDirectory: nisMapName=auto.home,ou=RDC_AMEC_LL-CDC01,ou=Locations,ou=LLDI,o=LLP#0#Quota=2800,Quality=EconomyHP,NisMap=llc1002:/proj/llc1002_zru12_analog/q,Id=analog
LLpHomeDirectory: nisMapName=auto.home,ou=RDC_AMEC_LL-CDC01,ou=Locations,ou=LLDI,o=LLP#0#Quota=1100,Quality=EconomyHP,NisMap=llc1002:/proj/llc1002_zru12_home/q,Id=home
LLpHomeDirectory: nisMapName=auto.home,ou=RDC_AMEC_LL-CDC01,ou=Locations,ou=LLDI,o=LLP#0#Quality=EconomyHP,NisMap=llc1002:/proj/llc1002_zru12_libddk/q,Quota=2162,Id=libddk
LLpHomeDirectory: nisMapName=auto.home,ou=RDC_AMEC_LL-CDC01,ou=Locations,ou=LLDI,o=LLP#0#Quality=EconomyHP,NisMap=llc1002:/proj/llc1002_zru12_proj/q,Quota=1102,Id=proj
fullName: zru12
costCenter: KG63010285
mail: [email protected]
LLpHomeDirectory: nisMapName=auto.home,ou=RDC_EMEA_NL-CDC01,ou=Locations,ou=LLDI,o=LLP#0#Quality=BLLinessCriticalHP,Quota=60,NisMap=llc4008:/proj/llc4008_zuriKG/q
fullName: Container to store ZuriKG vault
costCenter: KG63010285
mail: [email protected]
LLpHomeDirectory: nisMapName=auto.home,ou=RDC_EMEA_NL-CDC01,ou=Locations,ou=LLDI,o=LLP#0#Quality=EconomyHP,Quota=1,NisMap=llc3008:/proj/llc3008_zuriKG_rme/q
LLpHomeDirectory: nisMapName=auto.home,ou=SAT_EMEA_NL-RME01,ou=Locations,ou=LLDI,o=LLP#0#Quality=EconomyHP,Quota=30,NisMap=llc4014:/proj/llc4014_zuriKG_rme/q
LLpHomeDirectory: nisMapName=auto.home,ou=SAT_EMEA_NL-RME01,ou=Locations,ou=LLDI,o=LLP#0#Quality=ScratKGHP,Quota=400,NisMap=llc4014:/proj/llc4014_zuriKG_rme_scratKG/q,Id=scratKG
fullName: Project to restore project data on the RME work on HP-UX
Code which i am using:
def generate(data):
for record in data.split("\n\n"): # Split records based on two newlines (unix)
result = {}
for line in record.split("\n"): # Split properties based on single newlines (unix)
if line: # Skip empty lines happening for extra or trailing newlines
key, *value = line.split(": ") # Tolerant to lines with more than a single ´: ´ (*values)
value = ": ".join(value) # Recover original value if more than a single (`: `)
if key in result:
result[key] = ";" value
else:
result[key] = value
if result: # Don't yield empty results
yield result
frame = pd.DataFrame(generate("test"))
print(frame)
#frame["responsible"] = frame["LLpResponsible"].str.extract("cn=([\w]*)")
frame["location"] = frame["LLpHomeDirectory"].str.extract("ou=([\w_\-]*)")
frame["directory"] = frame["LLpHomeDirectory"].str.findall("NisMap=\w :([\w_\-/]*)")
df1 = frame[['costCenter', 'mail', 'responsible', 'location', 'directory']]
#df2 = df.explode("directory")[["costCenter", "responsible", "directory", "mail", "location"]]
print(df1)
Error:
When i am running this, i am getting below ....
$ python ldapDataParse1
test
0
Traceback (most recent call last):
File "/home/lib64/python3.6/site-packages/pandas/core/indexes/base.py", line 2898, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'LLpHomeDirectory'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "ldapDataParse1", line 30, in <module>
frame["location"] = frame["LLpHomeDirectory"].str.extract("ou=([\w_\-]*)")
File "/home/lib64/python3.6/site-packages/pandas/core/frame.py", line 2906, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/lib64/python3.6/site-packages/pandas/core/indexes/base.py", line 2900, in get_loc
raise KeyError(key) from err
KeyError: 'LLpHomeDirectory'
CodePudding user response:
You just need to copy paste those two functions together, the original autor even gave you the comments what the code does.
You need the context manager (with open(...)...) to open the file and then after for record in handler.read().split("\n\n")
you just use the 2nd version of the answer to take care of extra newlines.
def generate(file):
with open(file) as handler:
for record in handler.read().split("\n\n"):
result = {}
for line in record.split("\n"):
if line:
key, *value = line.split(": ")
value = ": ".join(value)
if key in result:
result[key] = ";" value
else:
result[key] = value
if result:
yield result
frame = pd.DataFrame(generate("test.txt"))
frame["responsible"] = frame["LLpResponsible"].str.extract("cn=([\w]*)")
frame["location"] = frame["LLpHomeDirectory"].str.extract("ou=([\w_\-]*)")
frame["directory"] = frame["LLpHomeDirectory"].str.findall("NisMap=\w :([\w_\-/]*)")
print(frame[["costCenter", "responsible", "directory"]])
Output:
costCenter responsible directory
0 MX61FRK604 LLa11826 [/proj/llc0156_zmx28home_3/q, /proj/llc0156_zm...
1 RU61FPD561 LLa09278 [/proj/llc2002_zru12/q, /proj/llc1002_zru12_an...
2 KG63010285 NaN [/proj/llc4008_zuriKG/q]
3 KG63010285 NaN [/proj/llc3008_zuriKG_rme/q, /proj/llc4014_zur...