Home > database >  How to split dataframe by specific string in rows
How to split dataframe by specific string in rows

Time:09-14

I have a dataframe like this:

df = pd.DataFrame({"a":["x1", 12, 14, "x2", 32, 9]})

df
Out[10]: 
    a
0  x1
1  12
2  14
3  x2
4  32
5   9

I would like to split it in multiple dataframes (in this case, two) if row begins with "x". And then this row should be the column name. Maybe splitting these dataframes and put inside a dictionary?

The output should be like this:

x1
Out[12]: 
   x1
0  12
1  14

x2
Out[13]: 
   x2
0  32
1   9

Anyone could help me?

CodePudding user response:

You can try cumsum on str.startswith then groupby on that:

for k, d in df.groupby(df['a'].str.startswith('x').fillna(0).cumsum()):
    # manipulate data to get desired output
    sub_df = pd.DataFrame(d.iloc[1:].to_numpy(), columns=d.iloc[0].to_numpy()) 

    # do something with it
    print(sub_df)
    print('-'*10)

Output:

   x1
0  12
1  14
----------
   x2
0  32
1   9
----------

CodePudding user response:

Something like this should work:

import pandas as pd
df = pd.DataFrame({"a":["x1", 12, 14, "x2", 32, 9]})
## Get the row index of value starting with x
ixs = []
for j in df.index:
    if isinstance(df.loc[j,'a'],str):
        if df.loc[j,'a'].startswith('x'):
            ixs.append(j)
dicto = {}
for i,val in enumerate(ixs):
    start_ix = ixs[i]
    if i == len(ixs) - 1:
        end_ix = df.index[-1]
    else:
        end_ix = ixs[i 1] - 1
    new_df = df.loc[start_ix:end_ix,'a'].reset_index(drop=True)
    new_df.columns = new_df.iloc[0]
    new_df.drop(new_df.index[0],inplace=True)
    dicto[i] = new_df

CodePudding user response:

A groupby is like a dictionary, so we can explicitly make it one:

dfs = {f'x{k}':d for k, d in df.groupby(df['a'].str.startswith('x').fillna(False).cumsum())}
for k in dfs:
    dfs[k].columns = dfs[k].iloc[0].values # Make x row the header.
    dfs[k] = dfs[k].iloc[1:] # drop x row.
    print(dfs[k], '\n')

Output:

   x1
1  12
2  14

   x2
4  32
5   9

  • Related