I'm doing a simple DataFrame concatenation tutorial for Python3. You can find this tutorial here: https://www.geeksforgeeks.org/dealing-with-rows-and-columns-in-pandas-dataframe/
import pandas as pd
# importing numpy as np
import numpy as np
# making data frame
df = pd.read_csv('nba.csv', index_col ='Name')
df.head(10)
new_row = pd.DataFrame({'Name':'Geeks', 'Team':'Boston', 'Number':3,
'Position':'PG', 'Age':33, 'Height':'6-2',
'Weight':189, 'College':'MIT', 'Salary':99999},
index =[0])
# simply concatenate both dataframes
df_new = pd.concat([new_row, df]).reset_index(drop = True)
df_new.head(5)
print(df_new)
Upon attempting to print df_new I get this output
0 Geeks Boston 3.0 PG 33.0 6-2 189.0 MIT 99999.0
1 NaN Boston Celtics 0.0 PG 25.0 6-2 180.0 Texas 7730337.0
2 NaN Boston Celtics 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0
3 NaN Boston Celtics 30.0 SG 27.0 6-5 205.0 Boston University NaN
4 NaN Boston Celtics 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0
.. ... ... ... ... ... ... ... ... ...
454 NaN Utah Jazz 8.0 PG 26.0 6-3 203.0 Butler 2433333.0
455 NaN Utah Jazz 25.0 PG 24.0 6-1 179.0 NaN 900000.0
456 NaN Utah Jazz 21.0 C 26.0 7-3 256.0 NaN 2900000.0
457 NaN Utah Jazz 24.0 C 26.0 7-0 231.0 Kansas 947276.0
458 NaN NaN NaN NaN NaN NaN NaN NaN NaN
[459 rows x 9 columns]
This is not the expected output. I was wondering if I was doing something wrong as I re-downloaded the csv in case the csv was corrupted somehow however that does not seem to be the case. I am new to python so I'm trying to figure out why DataFrame.head() is not working in terms of not reducing the output dataframe to 5 elements, and why the actual values are being set to null.
If anyone has any ideas let me know.
CodePudding user response:
This is an error in the tutorial. Just read the CSV without "index_col='Name'", and run the same code and it will work
CodePudding user response:
Writting it as an answer too.
You put "Name"
as index in your original dataframe. With that, you made it "dissapear" in df so when you try to concat a Name colum, it gets filled up with NaN. Just remove the index_col ='Name'
from your original df.
df = pd.read_csv('nba.csv')