Counting the number of integer values in a column depending on the value of another column in pandas-CodePudding

Hi i have a column which has values in both integer and text, I am trying to write a function in python where i can get the count of only integer values corresponding to another value in another column.

hear is the sample of the data,

constructorId   positionText
 1                 3
 1                 4
 1                 R
 4                 6
 4                 5
 4                 N
 4                 9

and i want the result to be like this

 constructorID    positionText_count
    1                   2
    4                   3

here is the code

  def not_finished(c):
   r = 0
   for c in hybrid_era_results['constructorId']:
       y = hybrid_era_results['positionText']
       if isinstance(y, int):
           r = r 1
   return r

this code does not throw an error but when i call the function it always returns the value 0. What am i doing wrong?

CodePudding user response：

As there are no clear details about the data, here a simple example to check the existance of intger values from col1 in col2, where both columns are in a str type.

import pandas as pd

# dimple example df
df = pd.DataFrame({"col1": ["value1", "4",  "5", "3"], "col2": ["3", "6", "value2", "4"]})

# get the index where we have integer in both columns
idx_col1 = df.col1.str.isdigit()
idx_col2 = df.col2.str.isdigit()

# retrieve integer values
df_col1_int = df.loc[idx_col1, "col1"]
df_col2_int = df.loc[idx_col2, "col2"]

# get only values from col1 tht exists in col2 (the same index is not required)    
idx_exists = df_col1_int.isin(df_col2_int)
df_exists = df_col1_int[idx_exists]

# get the number of integers in common
len(df_exists)

CodePudding user response：

IIUC, you can use to_numeric to filter the numeric values, then groupby.sum:

(pd.to_numeric(df['positionText'], errors='coerce')
   .notna()
   .groupby(df['constructorId'])
   .sum()
   .reset_index()
)

output:

   constructorId  positionText
0              1             2
1              4             3

To set up a custom column name, use .agg(positionText_count='sum') in place of .sum() (or .agg(**{'positionText_count': 'sum'}) if you want spaces of special characters):

   constructorId  positionText_count
0              1                   2
1              4                   3