Home > Software engineering >  Groupby year dropping some variables
Groupby year dropping some variables

Time:05-28

This is the original data and I need the mean of each year of all the variables.

Original data

But when I am using groupby('year') command, it is dropping all variables except 'lnmcap' and 'epu'.

Post Groupby output image

Why this is happening and what needs to be done?

CodePudding user response:

Probably the other columns have object or string type of the data, instead of integer, as a result of which only 'Inmcap' and 'epu' has got the average column.
Use ds.dtypes or simply ds.info() to check the data types of data in the columns
it comes out to be object/string type then use

ds=ds.drop('company',axis=1)
column_names=ds.columns
for i in column_names:
   ds[i]=ds[i].astype(str).astype(float)

This could work

CodePudding user response:

You will need to convert the numeric columns to float types. Use df.info() to check the various data types.

for col in ds.select_dtypes(['object']).columns:
    try:
        ds[col] = ds[col].astype('float')
    except:
        continue

After this, use df.info() to check again. Those columns with objects like '1.604809' will be converted to float 1.604809

Sometimes, the column may contain some "dirty" data that cannot be converted to float. In this case, you could use below code with errors='coerce' means non-numeric data becomes NaN

ds[col] = pd.to_numeric(ds[col], errors='coerce')
  • Related