concatenate values of rows for which index is in a nested list-CodePudding

I have TWO list of indexes as continuous = [[2, 3, 4], [10, 11]] (tuples can be length 3 or 4 or 5 or more) and non-continuous = [[7], [56]] (tuples can be length = 1) and a data frame as under:

INDEX        PARTICULARS
2             COMPENSATION CHARGE USE OF
3                   WAREHOUSING PREMISES
4                            FOR APR 22.
7                        RENT FOR APR 22
10                      BEING PAYMENT OF
11                   RENT OF NEW BRANCH.
56                    TOWARDS LANDSCAPE.

I need to concatenate the data frame as under:

PARTICULARS

COMPENSATION CHARGE USE OF WAREHOUSING PREMISES FOR APR 22   (being [2,3,4]
RENT FOR APR 22                                              (being[7] 
BEING PAYMENT OF RENT OF NEW BRANCH.                         (being[10,11]
TOWARDS LANDSCAPE.                                           (being [56]

CodePudding user response：

Are you looking for something like this below;

import pandas as pd
df = pd.DataFrame({"INDEX":[2,3,4,7,10,11,56],
                   "PARTICULARS":["COMPENSATION CHARGE USE OF","WAREHOUSING PREMISES","FOR APR 22.","RENT FOR APR 22","BEING PAYMENT OF","RENT OF NEW BRANCH.","TOWARDS LANDSCAPE."]})

continuous = [[2, 3, 4], [10, 11]]
non_continuous = [[7], [56]]
lst = continuous   non_continuous
INDEX,group = [],[]
for i,item in enumerate(lst):
    INDEX = INDEX   item
    group = group   [i]*len(item)
df1 = pd.DataFrame({"INDEX":INDEX,"Group":group})

df = df.merge(df1,on="INDEX",how="left")
    df2 = pd.DataFrame(df.groupby(["Group"])["PARTICULARS"].apply(lambda x: "%s" % ', '.join(x)))

# Output of df2

                                             PARTICULARS
Group                                                   
0      COMPENSATION CHARGE USE OF, WAREHOUSING PREMISES, FOR APR 22.
1                              BEING PAYMENT OF, RENT OF NEW BRANCH.
2                                                    RENT FOR APR 22
3                                                  TOWARDS LANDSCAPE.

CodePudding user response：

Lets say your existing dataframe is df, then create a dataframe called df_lists with your two lists continuous_lists and non_continuous_lists. Then merge them and group by grp column in df_lists and agg the PARTICULARS. See below code.

continuous_lists = [[2, 3, 4], [10, 11]]
non_continuous_lists = [[7], [56]]
df_lists = pd.DataFrame({'INDEX': 
                         [*continuous_lists, *non_continuous_lists]}).rename_axis('grp')
df_lists = df_lists.explode('INDEX').reset_index()

out = df_lists.merge(df).groupby('grp')['PARTICULARS'].agg(' '.join).reset_index(drop=True)

print(out):

0    COMPENSATION CHARGE USE OF WAREHOUSING PREMISE...
1                 BEING PAYMENT OF RENT OF NEW BRANCH.
2                                      RENT FOR APR 22
3                                   TOWARDS LANDSCAPE.