Home > Back-end >  create a pandas dataframe by subsetting every
create a pandas dataframe by subsetting every

Time:09-27

I have a data frame containing name and father's name as shown in the example below

    name 1
    name 2
    name 3
    father's name 1
    father's name 2
    father's name 3
    name 4
    name 5
    name 6
    father's name 4
    father's name 5
    father's name 6

now I want to create a data frame from this so that it appears as shown below

    name 1 father's name 1
    name 2 father's name 2
    name 3 father's name 3
    name 4 father's name 4
    name 5 father's name 5
    name 6 father's name 6

I want to do it in pandas and I have tried something mentioned in the link below Pandas every nth row but it did not helped much.

CodePudding user response:

You can create a condition based on the index of that name column then filter the column with boolean condition

mask = df['col'].index.values // 3 % 2 == 0

out = df.loc[mask, ['col']].reset_index(drop=True)   ' '   df.loc[~mask, ['col']].reset_index(drop=True)
print(out)

                      col
0  name 1 father's name 1
1  name 2 father's name 2
2  name 3 father's name 3
3  name 4 father's name 4
4  name 5 father's name 5
5  name 6 father's name 6

CodePudding user response:

try:

df
    col1
0   name 1
1   name 2
2   name 3
3   father's name 1
4   father's name 2
5   father's name 3
6   name 4
7   name 5
8   name 6
9   father's name 4
10  father's name 5
11  father's name 6

df['col2'] = df['col1'].str.split('\s').str[-1]
df1 = df.groupby('col2')['col1'].apply(lambda x: ' '.join(x)).reset_index()

df1
    col2    col1
0   1   name 1 father's name 1
1   2   name 2 father's name 2
2   3   name 3 father's name 3
3   4   name 4 father's name 4
4   5   name 5 father's name 5
5   6   name 6 father's name 6

df1 = df1.drop(columns='col2')

df1

    col1
0   name 1 father's name 1
1   name 2 father's name 2
2   name 3 father's name 3
3   name 4 father's name 4
4   name 5 father's name 5
5   name 6 father's name 6

  • Related