I have a dataframe in Python below:
import pandas as pd
df = pd.DataFrame({
'CRDACCT_DLQ_CYC_1_MNTH_AGO' : [3, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_2_MNTH_AGO': [4, 3, 3, 3, 3, 3, 2, 0, 5, 4, 3, 2, 0, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2],
'CRDACCT_DLQ_CYC_3_MNTH_AGO': [8, 7, 6, 5, 4, 3, 2, 'F', 'F', 0, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'C', 'F', 'F'],
'CRDACCT_DLQ_CYC_4_MNTH_AGO' : [0, 2, 'F', 'F', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'F'],
'CRDACCT_DLQ_CYC_5_MNTH_AGO' : [2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_6_MNTH_AGO' : [2, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0],
'CRDACCT_DLQ_CYC_7_MNTH_AGO' : [3, 3, 2, 'C', 'C', 'C', 'F', 0, 6, 5, 4, 3, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_8_MNTH_AGO' : [5, 4, 4, 3, 3, 2, 3, 2, 2, 2, 1, 2, 0, 2, 'C', 'C', 0, 2, 2, 2, 'C', 'C', 0, 'Z'],
'CRDACCT_DLQ_CYC_9_MNTH_AGO' : [2, 2, 'C', 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 0, 3, 2, 'C', 'F', 'C', 'F', 'F', 'F', 'F', 'F', 'F'],
'CRDACCT_DLQ_CYC_10_MNTH_AGO' : [5, 4, 3, 2, 3, 2, 0, 2, 0, 2, 'C', 'C', 'F', 2, 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'C'],
'CRDACCT_DLQ_CYC_11_MNTH_AGO' : [4, 3, 2, 'F', 2, 0, 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z'],
'CRDACCT_DLQ_CYC_12_MNTH_AGO' : ['F', 8, 7, 6, 5, 4, 3, 2, 'C', 'C', 'C', 0, 2, 'C', 'C', 0, 2, 0, 3, 2, 'C', 'C', 'F', 2]
})
df.head()
I want to create a new column that consists merging values of CRDACCT_DLQ_CYC_1_MNTH_AGO, CRDACCT_DLQ_CYC_2_MNTH_AGO, ....., CRDACCT_DLQ_CYC_12_MNTH_AGO. Let's say that new column named as HISTORY_DLQ.
If I print that new column, the expected result looks like this:
print(df['HISTORY_DLQ'])
#Output consists 24 rows of merging values of each column CRDACCT_DLQ_CYC_1_MNTH_AGO,..., CRDACCT_DLQ_CYC_12_MNTH_AGO.
[34802235254F,237222342438, C36FC224C327,...,C2FFC0CZFCZ2]
CodePudding user response:
Convert your columns to string then join each row of columns:
df['HISTORY_DLQ'] = df.astype(str).apply(''.join, axis=1)
print(df['HISTORY_DLQ'])
# Output:
0 34802235254F
1 237222342438
2 C36FC224C327
3 C35FCCC302F6
4 C34CCCC32325
5 C33CCCC20204
6 C22CCCF320Z3
7 C0FCCC02C2Z2
8 C5F0CC62C0ZC
9 C402CC52C2ZC
10 C3C0CC41CCZC
11 C2C2CC32CCZ0
12 C0C0CC200FZ2
13 C2C2CC2232ZC
14 C2CCCCCC2FZC
15 C2CCCCCCCFZ0
16 C2CCCCC0FFZ2
17 C2CCCCC2CFZ0
18 C2CCCCC2FFZ3
19 C0FCC0C2FFZ2
20 C2CCC2CCFFZC
21 C2CFC0CCFFZC
22 C0FCC2C0FFZF
23 C2FFC0CZFCZ2
dtype: object
CodePudding user response:
You can convert the columns to strings, then concat the strings on each row across columns using .sum()
, as follows:
df['HISTORY_DLQ'] = df.astype(str).sum(axis=1)
.sum()
on strings is like what you do to 'abc' 'def'
and will concat the strings to become 'abcdef'
. When using it on axis=1
, it works on each row across the columns. Thus, achieving the result we want.
If your dataframe contains other columns that you don't want to merge their values, you can filter only the related columns by .filter()
before we apply the logics above:
df['HISTORY_DLQ'] = df.filter(regex=r'CRDACCT_DLQ_CYC_\d _MNTH_AGO').astype(str).sum(axis=1)
Here, we filter with regex r'CRDACCT_DLQ_CYC_\d _MNTH_AGO'
for column names that starts with CRDACCT_DLQ_CYC_
followed by one or more digits \d
, then followed by _MNTH_AGO
for the 12 columns.
Result:
print(df['HISTORY_DLQ'])
0 34802235254F
1 237222342438
2 C36FC224C327
3 C35FCCC302F6
4 C34CCCC32325
5 C33CCCC20204
6 C22CCCF320Z3
7 C0FCCC02C2Z2
8 C5F0CC62C0ZC
9 C402CC52C2ZC
10 C3C0CC41CCZC
11 C2C2CC32CCZ0
12 C0C0CC200FZ2
13 C2C2CC2232ZC
14 C2CCCCCC2FZC
15 C2CCCCCCCFZ0
16 C2CCCCC0FFZ2
17 C2CCCCC2CFZ0
18 C2CCCCC2FFZ3
19 C0FCC0C2FFZ2
20 C2CCC2CCFFZC
21 C2CFC0CCFFZC
22 C0FCC2C0FFZF
23 C2FFC0CZFCZ2
Name: HISTORY_DLQ, dtype: object