How to count the number of columns in a row using Pandas Python-CodePudding

I am writing a program where I want to count the number of columns in each row as each file has a different number of columns. It means I want to check if any row is missing a cell, and if it does, then I want to highlight the cell number. I am using pandas for that to read the file. I have multiple gzip files which contain another CSV file. My code for reading the files:

#running this under loop
data = pd.read_csv(files,
    compression='gzip'
    on_bad_lines='warn'
    low_memory=False,
    sep=r'|',
    header=None,
    na_values=['NULL',' ','NaN'],
    keep_default_na = False
    )

I checked StackOverflow but there's no answer related to this situation. I would be really glad if someone can help me out here.

CodePudding user response：

Not sure if i'm interpreting this right but if you want to count the number of columns in each pandas dataframe within a loop, there are plenty of options.

1) data.shape[1]
2) len(data.columns)
3) len(list(data))

Here is a minimal reproducibility code. Replace "data = pd.DataFrame(dict)" with "data = pd.read_csv(...)"

# Import Required Libraries
import pandas as pd
import numpy as np

# Create dictionaries for the dataframe
dict1 = {'Name': ['Anne', 'Bob', 'Carl'], 
         'Age': [22, 20, 22], 
         'Marks': [90, 84, 82]}

dict2 = {'Name': ['Dan', 'Ely', 'Fan'], 
         'Age': [52, 30, 12], 
         'Marks': [40, 54, 42]}

for i in [dict1, dict2]:
    # Read data
    data = pd.DataFrame(dict1)

    # Get columns
    shape = data.shape # (3,3)
    col = shape[1] # 3
  
    # Printing Number of columns
    print(f'Number of columns for file <>: {col}')

"This works fine, but after trying your suggestion I am getting the total number of columns that we have in our data frame. I want to print the number of columns each row contains. For eg: S.no Name 1 Adam 2 George 3 NULL so, 1st row will print 2, the second will be 2, but the third will print one."
– Ramoxx

Below is the updated answer(s) for your specification

Get counts of non-nulls for each row

data.apply(lambda x: x.count(), axis=1)

data:

  A   B   C
0:  1   2   3
1:  2   nan nan
2:  nan nan nan

output:

0:  3
1:  1
2:  0

Add counts of non-nulls for each row into dataframe

data['count'] = data.apply(lambda x: x.count(), axis=1)

result:

    A   B   C   count
0:  1   1   3   3
1:  2   nan nan 1
2:  nan nan nan 0