I have a dataframe like this:
df = pd.DataFrame({"a":["x1", 12, 14, "x2", 32, 9]})
df
Out[10]:
a
0 x1
1 12
2 14
3 x2
4 32
5 9
I would like to split it in multiple dataframes (in this case, two) if row begins with "x". And then this row should be the column name. Maybe splitting these dataframes and put inside a dictionary?
The output should be like this:
x1
Out[12]:
x1
0 12
1 14
x2
Out[13]:
x2
0 32
1 9
Anyone could help me?
CodePudding user response:
You can try cumsum
on str.startswith
then groupby
on that:
for k, d in df.groupby(df['a'].str.startswith('x').fillna(0).cumsum()):
# manipulate data to get desired output
sub_df = pd.DataFrame(d.iloc[1:].to_numpy(), columns=d.iloc[0].to_numpy())
# do something with it
print(sub_df)
print('-'*10)
Output:
x1
0 12
1 14
----------
x2
0 32
1 9
----------
CodePudding user response:
Something like this should work:
import pandas as pd
df = pd.DataFrame({"a":["x1", 12, 14, "x2", 32, 9]})
## Get the row index of value starting with x
ixs = []
for j in df.index:
if isinstance(df.loc[j,'a'],str):
if df.loc[j,'a'].startswith('x'):
ixs.append(j)
dicto = {}
for i,val in enumerate(ixs):
start_ix = ixs[i]
if i == len(ixs) - 1:
end_ix = df.index[-1]
else:
end_ix = ixs[i 1] - 1
new_df = df.loc[start_ix:end_ix,'a'].reset_index(drop=True)
new_df.columns = new_df.iloc[0]
new_df.drop(new_df.index[0],inplace=True)
dicto[i] = new_df
CodePudding user response:
A groupby
is like a dictionary, so we can explicitly make it one:
dfs = {f'x{k}':d for k, d in df.groupby(df['a'].str.startswith('x').fillna(False).cumsum())}
for k in dfs:
dfs[k].columns = dfs[k].iloc[0].values # Make x row the header.
dfs[k] = dfs[k].iloc[1:] # drop x row.
print(dfs[k], '\n')
Output:
x1
1 12
2 14
x2
4 32
5 9