I'm trying to create a new column on a dataset (csv file) that combines contents of pre-existing columns .
import numpy as np
import pandas as pd
df = pd.read_csv('books.csv', encoding='unicode_escape', error_bad_lines=False)
#List of columns to keep
columns =['title', 'authors', 'publisher']
#Function to combine the columns/features
def combine_features(data):
features = []
for i in range(0, data.shape[0]):
features.append( data['title'][i] ' ' data['authors'][i] ' ' data['publisher'][i])
return features
#Column to store the combined features
df['combined_features'] =combine_features(df)
#Show data
df
I was expecting to find that the new column would be created with the title, author and publisher all in one, however I received the error "ValueError: Length of values (1) does not match length of index (11123)".
To fix this tried to use the command "df.reset_index(inplace=True,drop=True)" which was a suggested solution but that did not work and I am still receiving the same error.
Below is the whole error message:
ValueError Traceback (most recent call last)
<ipython-input-24-40cc76d3cd85> in <module>
1 #Create a column to store the combined features
----> 2 df['combined_features'] =combine_features(df)
3 df
3 frames
/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py in __setitem__(self, key, value)
3610 else:
3611 # set column
-> 3612 self._set_item(key, value)
3613
3614 def _setitem_slice(self, key: slice, value):
/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py in _set_item(self, key, value)
3782 ensure homogeneity.
3783 """
-> 3784 value = self._sanitize_column(value)
3785
3786 if (
/usr/local/lib/python3.8/dist-packages/pandas/core/frame.py in _sanitize_column(self, value)
4507
4508 if is_list_like(value):
-> 4509 com.require_length_match(value, self.index)
4510 return sanitize_array(value, self.index, copy=True, allow_2d=True)
4511
/usr/local/lib/python3.8/dist-packages/pandas/core/common.py in require_length_match(data, index)
529 """
530 if len(data) != len(index):
--> 531 raise ValueError(
532 "Length of values "
533 f"({len(data)}) "
ValueError: Length of values (1) does not match length of index (11123)
CodePudding user response:
The reason is the return
statement in the function should not be inside the for loop. Because it is, it returns already after 1 iteration, so the length of values is one, rather than 11123. Unindent the return
once.
CodePudding user response:
Surprising that I unable to reproduce the error and the program works as expected for me. Try printing the shape of df and inspect the CSV file!