Adding multiple columns in Pandas with results of a function-CodePudding

First, let me explain the thing I'm trying to do:

I have a dataframe with emails of users (among other values) and I can use those emails to get some information from them such as manager, department, description by consulting some other system (LDAP).

So far I can go through the rows and use the email to get each piece of data, but since each time a very slow query is performed, I'd like to be able to get the three values at once, then 'unpack' them and finally place them into the three new corresponding columns.

# This is the function that returns the three values from the query
def get_user_data(email):

    LDAP = LDAPConnector()
    LDAP.bind_ldap()

    searchAttribute = ["manager","department","description"]
    searchFilter = f"(&(objectclass=person)(mail={email}))"

    ldap_result = LDAP.ldap_query(searchAttribute, searchFilter)
    
    manager = f"{ldap_result['manager'][0].decode('UTF-8').split('=')[1].split(',')[0]}"
    department = f"{ldap_result['department'][0].decode('UTF-8')}"
    description = f"{ldap_result['description'][0].decode('UTF-8')}"

    LDAP.unbind_ldap()
    
    return manager, department, description

Then I call it like this:

computers = pn.read_csv('/path/to/my_file.csv')

# Using the slice [0:2] to test without going through the whole file
computers.loc[0:2, 'Manager'], computers.loc[0:2, 'Department'], computers.loc[0:2, 'Description'] = computers.loc[0:2, 'Email_Address'].map(lambda email: get_user_data(email))

# And then save it into another file just to check
computers.to_csv('/path/to/new_file.csv')

My problem with this is that instead of the expected result

Email 0, Manager 0, Department 0, Description 0
Email 1, Manager 1, Department 1, Description 1
Email 2, Manager 2, Department 2, Description 2

I am getting

Email 0, Manager 0, Manager 1, Manager 2
Email 1, Department 0, Department 1, Department 2
Email 2, Description 0, Description 1, Description 2

So I figure that I need some sort of transposition or a whole different way to unpack the values.

By the way, I have another approach in which I store the whole LDAP result in a column and then can go again through the rows parsing the values and storing them… But I have the feeling that it could be more elegant to get the values in place at once.

Thanks!

CodePudding user response：

computers['Manager'][0:2], computers['Department'][0:2], computers.['Description'][0:2] = zip(*computers.loc[0:2, 'Email_Address'].map(lambda email: get_user_data(email)))

This should work. Tested on this example:

s = pd.Series(['cat', 'dog'])
data = pd.DataFrame([['cat',0,0],['dog',0,0]], columns=['animal', 'n1', 'n2'])
#data['n1'], data['n2']=zip(*s.map({'cat': ['kitten', 'cat'], 'dog': ['puppy', 'dog']}))
data['n1'][0:2], data['n2'][0:2]=zip(*s.map({'cat': ['kitten', 'cat'], 'dog': ['puppy', 'dog']}))
print(data)

Output:

  animal      n1   n2
0    cat  kitten  cat
1    dog   puppy  dog

CodePudding user response：

I made it work by getting the result as a Series of tuples, then converting that into a DataFrame and finally concatenating with the original DataFrame.

def get_user_data_applied(computer):
    
    # Process to query LDAP and get the values
    
    return(manager, department, description)

# Get the data into a series from the function (just a slice 0:10 to test now)
user_data = computers.loc[0:10].apply(get_user_data, axis='columns')
    
# Convert the series into a DataFrame with list comprehension
user_data = pn.DataFrame([[a, b, c] for a, b, c in user_data.values], columns=['Manager', 'Department', 'Description'])
    
# Concatenate both things, the original list and the new one.
computers = pn.concat([computers, user_data], axis=1)