Home > OS >  Levenshtein edit-distance between rows and columns
Levenshtein edit-distance between rows and columns

Time:03-08

I have created the following empty DataFrame:

columns = ['Band','Tree','Foot']
rows = ['Hand', 'Foot', 'Shoulder']

df = pd.DataFrame(index=rows, columns=columns)

I want to calculate the distance between the columns and the rows and am currently using the following code:

import pandas as pd
import nltk

def distance(x):
    i = x.index
    j = x.name
    return nltk.edit_distance(i,j)

df = df.apply(distance)

But this returns:

- -
Band 4
Tree 4
Foot 4

I would like it to return the distance between the corresponding column and row for each cell.

Band Tree Foot
Hand 1 4 4
Foot 4 4 0
Shoulder 7 7 7

What am I missing?

CodePudding user response:

edit_distance expects 2 strings, so you have to iterate over the indexes. One option is to apply a lambda that does that on df:

df.apply(lambda col: [nltk.edit_distance(col.name, i) for i in col.index])

But, instead of filling in a DataFrame, I think it's simpler to first create a dictionary with the values; then build a DataFrame as follows:

df = pd.DataFrame({j: {i: nltk.edit_distance(i,j) for i in rows} for j in columns})

Output:

          Band  Tree  Foot
Hand         1     4     4
Foot         4     4     0
Shoulder     7     7     7
  • Related