I would like to re-work my code to use a For Loop to get row counts by specific columns using Python (there are 15 columns in total and I am looking for row counts for 4 specific ones at this time) -
This is the current input:
#get row count by affiliate, race, ethnicity and abortion type
print('Column_1:', batch_df.groupby('Column_1').size().sum())
print('Column_2:', batch_df.groupby('Column_2').size().sum())
print('Column_3:',batch_df.groupby('Column_3').size().sum())
print('Column_4:',batch_df.groupby('Column_4').size().sum())
The output (which is correct) is below:
Column_1: 468676
Column_2: 465755
Column_3: 468400
Column_4: 468676
Is there a way to re-work the input so that it is a For Loop?
CodePudding user response:
This should work if you want to specify the columns by name:
for col in ['Column_1', 'Column_2', 'Column_3', 'Column_4']:
print('{}:'.format(col), batch_df.groupby(col).size().sum())
CodePudding user response:
No need to write all column names as df.columns returns column names and then you can loop them true:
for c in df.columns:
print(c)
For example with dataframe
df = pd.DataFrame({
'Column_1': ['1', '2', '3', '4'],
'Column_2' : ['11','12','13','14'],
'Column_3': ['101', '102','103', '104']})
will print
Column_1
Column_2
Column_3