I'm looking to split my starting dataframe into 3 new dataframes based on a slice of the original. I know I'm close, but cannot tell where I'm going wrong, see below. Currently, my first and last df look good, but the middle is not correct as it's extending to the very end. Appreciate any guidance, as well as if there is an overall better method.
import math
import pandas as pd
data = {'Name':['Bob','Kyle','Kevin','John','Nolan','Gary','Dylan','Brandon'],
'attr':['abc123','1230','(ab)','asd','kgi','eiru','cmvn','mlok']}
df = pd.DataFrame(data)
one_third_count = math.ceil(len(df) / 3)
print(one_third_count)
print(one_third_count*2)
df_1 = df.iloc[:one_third_count]
display(df_1.info())
df_2 = df.iloc[one_third_count:, :one_third_count * 2]
display(df_2.info())
df_3 = df.iloc[one_third_count*2:]
display(df_3.info())
CodePudding user response:
Your problem is this line:
df_2 = df.iloc[one_third_count:, :one_third_count * 2]
You want:
df_2 = df.iloc[one_third_count : one_third_count * 2]
The ":" notation says "Give me a slice from the left of the ':' to the right of the ':'." If you don't include a left or right hand side then you are taken to mean "very beginning" or "very end" respectively.
Originally, with the comma, you were asking iloc for two dimensions. The original says "Give me from one_third_count to the end" on the first dimension (rows) and then from the start to one_third_count * 2 on the second dimension (columns). Because that's more columns than you had, this just gets everything. That's why the original df_2 had from one_third to the end.
CodePudding user response:
Here is a proposition with numpy.array_split
and globals
to create the chunks dynamically.
one_third_count = math.ceil(len(df)/3)
dfs = np.array_split(df, one_third_count)
for idx, ch in enumerate(dfs, start=1):
globals()[f"df_{idx}"] = ch
# Output :
print(df_2)
Name attr
3 John asd
4 Nolan kgi
5 Gary eiru