Home > Mobile >  Appending a large amount of columns into 1 column
Appending a large amount of columns into 1 column

Time:10-11

I have a dataset of ~700 columns. I'd like to join all columns into a single column.

Input:

id | A   | B  | C   | D  | E  | F  | ... | Z
0  | yes | no | yes | no | yes| no | ... | no
1  | no  | no | yes | no | no | no | ... | no
2  | yes | yes| yes | yes| yes| no | ... | no

Output:

id | A   | B  | C   | D  | E  | F  | ... | Z  | joined_column
0  | yes | no | yes | no | yes| no | ... | no | yes no yes no yes no ... no
1  | no  | no | yes | no | no | no | ... | no | no no yes no no no ... no
2  | yes | yes| yes | yes| yes| no | ... | no | yes yes yes yes yes no ... no

I've used the following in the past. However, I'm looking for a way to scale this when you're working with a large amount of columns.

def join(df):
list = []
    for i in range(0, df.shape[0]):
        list.append( str(df['A'][i])   ' '   str(df['B'][i])   ' '   str(df['C'][i]))
    return list

CodePudding user response:

Given df:

     A    B    C    D    E   F
0  yes   no  yes   no  yes  no
1   no   no  yes   no   no  no
2  yes  yes  yes  yes  yes  no

Doing:

cols = df.columns

# As a string:
df['joined_column_str'] = df[cols].agg(' '.join, axis=1)

# As a list:
df['joined_column_list'] = df[cols].agg(list, axis=1)

Output:

     A    B    C    D    E   F       joined_column_str             joined_column_list
0  yes   no  yes   no  yes  no    yes no yes no yes no    [yes, no, yes, no, yes, no]
1   no   no  yes   no   no  no      no no yes no no no      [no, no, yes, no, no, no]
2  yes  yes  yes  yes  yes  no  yes yes yes yes yes no  [yes, yes, yes, yes, yes, no]

CodePudding user response:

Perhaps you can use a list comprehension:

df['joined_column'] = [' '.join(item) for item in df.iloc[:,1:].values]

   id    A    B    C    D    E   F   Z              joined_column
0   0  yes   no  yes   no  yes  no  no    yes no yes no yes no no
1   1   no   no  yes   no   no  no  no      no no yes no no no no
2   2  yes  yes  yes  yes  yes  no  no  yes yes yes yes yes no no

CodePudding user response:

You can use agg to aggregate data over a specific axis using a specific operation:

df['joined_column'] = df.agg(' '.join, axis=1)

Potentially use

df['joined'] = df.iloc[:,1:].agg(' '.join, axis=1)

if you don't want to join the first (or any other column).

I used ' '.join to join the values with a space as separator.

axis is set to 1 because you want to join columns and not rows.

CodePudding user response:

Considering the dataframe df

   id    A    B  ...    Y    Z                                      joined_column
0   0  yes   no  ...  yes   no  yes no yes no yes no yes no yes no yes no yes ...
1   1   no   no  ...   no   no  no no yes no no no no no no no no no no no no ...
2   2  yes  yes  ...  yes  yes  yes yes yes yes yes no yes yes yes yes yes yes...

One can use a custom lambda function inside pandas.DataFrame.apply as follows

df['joined_column'] = df.apply(lambda x: ' '.join(x.astype(str)), axis=1)

[Out]:

   id    A    B  ...    Y    Z                                      joined_column
0   0  yes   no  ...  yes   no  0 yes no yes no yes no yes no yes no yes no ye...
1   1   no   no  ...   no   no  1 no no yes no no no no no no no no no no no n...
2   2  yes  yes  ...  yes  yes  2 yes yes yes yes yes no yes yes yes yes yes y...

However, as one doesn't want the id to appear in the joined_column, one can add x.drop('id') as

df['joined_column'] = df.apply(lambda x: ' '.join(x.drop('id').astype(str)), axis=1)

[Out]:

   id    A    B  ...    Y    Z                                      joined_column
0   0  yes   no  ...  yes   no  yes no yes no yes no yes no yes no yes no yes ...
1   1   no   no  ...   no   no  no no yes no no no no no no no no no no no no ...
2   2  yes  yes  ...  yes  yes  yes yes yes yes yes no yes yes yes yes yes yes...

Alternatively, for this last case, one can also leverage Numpy, more specifically numpy.array as follows

df['joined_column'] = np.array([' '.join(x.astype(str)) for x in df.drop('id', axis=1).values])

[Out]:

   id    A    B  ...    Y    Z                                      joined_column
0   0  yes   no  ...  yes   no  yes no yes no yes no yes no yes no yes no yes ...
1   1   no   no  ...   no   no  no no yes no no no no no no no no no no no no ...
2   2  yes  yes  ...  yes  yes  yes yes yes yes yes no yes yes yes yes yes yes...
  • Related