Home > Blockchain >  Getting each from of a dataframe without column values
Getting each from of a dataframe without column values

Time:11-17

I'm trying to add a column to each row of a dataframe which includes a hash value of the row values.

I originally tried this:

df['hash'] = pd.Series((hash(tuple(row)) for _, row in df_to_hash.iterrows()))

However, when I ran this on two different DataFrames, I was encountering an issue when the column names didn't exactly match.

For example:

DF1:

  Name Age 
0 Tom   12
1 Pat   15

DF1:

  FirstName Age 
0 Tom       12
1 Pat       15

When I hashed the above DataFrames, row 0 in each dataframe had a different value due to the columns being different.

Is there a way I can has the row values only, excluding the columns?

I also tried this with no success:

df['hash'] = df_to_hash.apply(lambda x: hash(tuple(x)), axis=1)

CodePudding user response:

What about using the underlying numpy array:

pd.Series((hash(tuple(row)) for row in df_to_hash.to_numpy()))

Output:

0    2606281096150585092
1   -1842928179554038127
dtype: int64

You can also use pandas.util.hash_pandas_object with index=False:

pd.util.hash_pandas_object(df_to_hash, index=False)

Output:

0    17445307237601047733
1    15658167368827391476
dtype: uint64
  • Related