I have created the following empty DataFrame:
columns = ['Band','Tree','Foot']
rows = ['Hand', 'Foot', 'Shoulder']
df = pd.DataFrame(index=rows, columns=columns)
I want to calculate the distance between the columns and the rows and am currently using the following code:
import pandas as pd
import nltk
def distance(x):
i = x.index
j = x.name
return nltk.edit_distance(i,j)
df = df.apply(distance)
But this returns:
- | - |
---|---|
Band | 4 |
Tree | 4 |
Foot | 4 |
I would like it to return the distance between the corresponding column and row for each cell.
Band | Tree | Foot | |
---|---|---|---|
Hand | 1 | 4 | 4 |
Foot | 4 | 4 | 0 |
Shoulder | 7 | 7 | 7 |
What am I missing?
CodePudding user response:
edit_distance
expects 2 strings, so you have to iterate over the indexes. One option is to apply a lambda that does that on df
:
df.apply(lambda col: [nltk.edit_distance(col.name, i) for i in col.index])
But, instead of filling in a DataFrame, I think it's simpler to first create a dictionary with the values; then build a DataFrame as follows:
df = pd.DataFrame({j: {i: nltk.edit_distance(i,j) for i in rows} for j in columns})
Output:
Band Tree Foot
Hand 1 4 4
Foot 4 4 0
Shoulder 7 7 7