I want to use a dictionary to add a column to a pandas DataFrame. I use apply lambda with a function to a row. I get 'ValueError: Columns must be same length as key'. I should be able to add a new column, but to simplify I included the column to change in the df.
I don't see what I'm doing wrong.
import pandas as pd
court_dict = dict(zip(['INC:INC08 Pensions', 'TX:TX01 Federal Tax', 'HO:HO08 Rent'], [8, 8, 0]))
bank_info = {
'Category':['INC:INC08 Pensions', 'TX:TX01 Federal Tax', 'HO:HO08 Rent'],
'Amount':[1250.23, 300.0, 1000],
'Paragraph': ['', '', '', ]
}
bank2 = pd.DataFrame(bank_info)
def get_column_names(row: pd.core.series.Series, position: int) -> str:
category = row['Category']
result = court_dict.get(category, 'd')
print(category, result)
return result
if __name__=="__main__":
bank2[['Paragraph']] = bank2.apply(lambda row:get_column_names(row, 0), axis=1)
print(bank2)
Here's the output:
C:\Users\Steve\anaconda3\envs\AccountingPersonal\python.exe C:\Users\Steve\PycharmProjects\AccountingPersonal\src\get_simple.py
INC:INC08 Pensions 8
TX:TX01 Federal Tax 8
HO:HO08 Rent 0
Traceback (most recent call last):
File "C:\Users\Steve\PycharmProjects\AccountingPersonal\src\get_simple.py", line 20, in <module>
bank2[['Paragraph']] = bank2.apply(lambda row:get_column_names(row, 0), axis=1)
File "C:\Users\Steve\anaconda3\envs\AccountingPersonal\lib\site-packages\pandas\core\frame.py", line 3643, in __setitem__
self._setitem_array(key, value)
File "C:\Users\Steve\anaconda3\envs\AccountingPersonal\lib\site-packages\pandas\core\frame.py", line 3702, in _setitem_array
self._iset_not_inplace(key, value)
File "C:\Users\Steve\anaconda3\envs\AccountingPersonal\lib\site-packages\pandas\core\frame.py", line 3721, in _iset_not_inplace
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
Process finished with exit code 1
CodePudding user response:
Everything looks perfect just [[]] and []-
bank2['Paragraph'] = bank2.apply(lambda row:get_column_names(row, 0), axis=1)
[] -> is a series [[]] -> is a dataframe
CodePudding user response:
With bank2[['Paragraph']]
, you're returning a DataFrame and not a Series. You need to use single square brackets []
instead.
def get_column_names(row: pd.core.series.Series, position: int) -> str:
category = row['Category']
result = court_dict.get(category, 'd')
print(category, result)
return result
if __name__=="__main__":
bank2['Paragraph'] = bank2.apply(lambda row:get_column_names(row, 0), axis=1) # <- line updated
print(bank2)
By the way, you can use pandas.Series.map
without using apply
and a custom function to get your expected output/column.
if __name__=="__main__":
bank2['Paragraph'] = bank2['Category'].map(court_dict)
print(bank2)
CodePudding user response:
Use single brackets when you try to assign something. Double brackets returns the column(s).
bank2['Paragraph'] = bank2.apply(lambda row:get_column_names(row, 0), axis=1)