How to obtain the first 4 rows for every 20 rows from a CSV file-CodePudding

I've Read the CVS file using pandas and have managed to print the 1st, 2nd, 3rd and 4th row for every 20 rows using .iloc.

Prem_results = pd.read_csv("../data sets analysis/prem/result.csv") 
Prem_results.iloc[:320:20,:]
Prem_results.iloc[1:320:20,:]
Prem_results.iloc[2:320:20,:]
Prem_results.iloc[3:320:20,:]

Is there a way using iloc to print the 1st 4 rows of every 20 lines together rather then seperately like I do now? Apologies if this is worded badly fairly new to both python and using pandas.

CodePudding user response：

Using groupby.head:

Prem_results.groupby(np.arange(len(Prem_results)) // 20).head(4)

CodePudding user response：

You can concat slices together like this:

pd.concat([df[i::20] for i in range(4)]).sort_index()

MCVE:

df = pd.DataFrame({'col1':np.arange(1000)})
pd.concat([df[i::20] for i in range(4)]).sort_index().head(20)

Output:

Start at 0 get every 20 rows
Start at 1 get every 20 rows
Start at 2 get every 20 rows
And, start at 3 get every 20 rows.

CodePudding user response：

You can also do this while reading the csv itself.

df = pd.DataFrame()
for chunk in pd.read_csv(file_name, chunksize = 20):
    df = pd.concat((df, chunk.head(4)))

More resources: You can read more about the usage of chunksize in Pandas official documentation here.

I also have a post about its usage here.