First, let me explain the thing I'm trying to do:
I have a dataframe with emails of users (among other values) and I can use those emails to get some information from them such as manager, department, description by consulting some other system (LDAP).
So far I can go through the rows and use the email to get each piece of data, but since each time a very slow query is performed, I'd like to be able to get the three values at once, then 'unpack' them and finally place them into the three new corresponding columns.
# This is the function that returns the three values from the query
def get_user_data(email):
LDAP = LDAPConnector()
LDAP.bind_ldap()
searchAttribute = ["manager","department","description"]
searchFilter = f"(&(objectclass=person)(mail={email}))"
ldap_result = LDAP.ldap_query(searchAttribute, searchFilter)
manager = f"{ldap_result['manager'][0].decode('UTF-8').split('=')[1].split(',')[0]}"
department = f"{ldap_result['department'][0].decode('UTF-8')}"
description = f"{ldap_result['description'][0].decode('UTF-8')}"
LDAP.unbind_ldap()
return manager, department, description
Then I call it like this:
computers = pn.read_csv('/path/to/my_file.csv')
# Using the slice [0:2] to test without going through the whole file
computers.loc[0:2, 'Manager'], computers.loc[0:2, 'Department'], computers.loc[0:2, 'Description'] = computers.loc[0:2, 'Email_Address'].map(lambda email: get_user_data(email))
# And then save it into another file just to check
computers.to_csv('/path/to/new_file.csv')
My problem with this is that instead of the expected result
Email 0, Manager 0, Department 0, Description 0
Email 1, Manager 1, Department 1, Description 1
Email 2, Manager 2, Department 2, Description 2
I am getting
Email 0, Manager 0, Manager 1, Manager 2
Email 1, Department 0, Department 1, Department 2
Email 2, Description 0, Description 1, Description 2
So I figure that I need some sort of transposition or a whole different way to unpack the values.
By the way, I have another approach in which I store the whole LDAP result in a column and then can go again through the rows parsing the values and storing them… But I have the feeling that it could be more elegant to get the values in place at once.
Thanks!
CodePudding user response:
computers['Manager'][0:2], computers['Department'][0:2], computers.['Description'][0:2] = zip(*computers.loc[0:2, 'Email_Address'].map(lambda email: get_user_data(email)))
This should work. Tested on this example:
s = pd.Series(['cat', 'dog'])
data = pd.DataFrame([['cat',0,0],['dog',0,0]], columns=['animal', 'n1', 'n2'])
#data['n1'], data['n2']=zip(*s.map({'cat': ['kitten', 'cat'], 'dog': ['puppy', 'dog']}))
data['n1'][0:2], data['n2'][0:2]=zip(*s.map({'cat': ['kitten', 'cat'], 'dog': ['puppy', 'dog']}))
print(data)
Output:
animal n1 n2
0 cat kitten cat
1 dog puppy dog
CodePudding user response:
I made it work by getting the result as a Series of tuples, then converting that into a DataFrame and finally concatenating with the original DataFrame.
def get_user_data_applied(computer):
# Process to query LDAP and get the values
return(manager, department, description)
# Get the data into a series from the function (just a slice 0:10 to test now)
user_data = computers.loc[0:10].apply(get_user_data, axis='columns')
# Convert the series into a DataFrame with list comprehension
user_data = pn.DataFrame([[a, b, c] for a, b, c in user_data.values], columns=['Manager', 'Department', 'Description'])
# Concatenate both things, the original list and the new one.
computers = pn.concat([computers, user_data], axis=1)