How can I not using \n without separating the list in pandas dataframe?-CodePudding

I'm trying to use \n to add a new line in pandas dataframe

Here is the sample data to test:

df = pd.DataFrame({'KEY': [4507,211,5294,2233,2260],'NAME':['kim young','laa eudong','kill gil','lee suk','No hee'],'FIND_DATE':[20130518,20140626,20140215,20141121,20140910],'EVENT_DTL':['A','B','C','D','E']})

df.loc[:3,'EVENT_DTL'] = np.nan

The dataframe looks like this:

input:

df.loc[df.EVENT_DTL.isna(),['KEY','NAME','FIND_DATE','EVENT_DTL']]

output:

         KEY    NAME    FIND_DATE   EVENT_DTL
2143    4507    kim young   20130518    NaN
2386    211     Laa euong   20140626    NaN
2522    5294    Kim gil     20140215    NaN
3287    2233    Lee suk     20141121    NaN
3330    2260    No hee      20140910    NaN
... ... ... ... ...
62870   51632   Her yun     20170213    NaN
103829  38076   Lee jae     20150518    NaN
104560  9818    Yun young   20130812    NaN
104816  53838   Kang gae    20140818    NaN
104817  53840   Bae ssun    20141202    NaN
107 rows × 4 columns

So I tried this code to fill out Nan values in EVENT_DTL column

# Let's test
idx = df.EVENT_DTL.isna()
df.loc[idx,'EVENT_DTL'] = ('1. 변사자 정보 : ' df.loc[idx,'NAME'] df.loc[idx,'FIND_DATE'].astype(str).str[:4] '년' df.loc[idx,'FIND_DATE'].astype(str).str[4:6] '월' ' ' '\n3. 발견장소 : \n1) 수사기록 상 주소 \n주민등록상 주소 : ').str.split('\n')

df = df.explode('EVENT_DTL')

And the output(When I run df.loc[[2143,2386],['KEY','EVENT_DTL']]to check if my code runs well); seems like it created other rows:

KEY EVENT_DTL
2143    4507    1. 변사자 정보 : kim young2013년05월
2143    4507    3. 발견장소 :
2143    4507    1) 수사기록 상 주소
2143    4507    주민등록상 주소 :
2386    211     1. 변사자 정보 : Laa euong2014년06월
2386    211     3. 발견장소 :
2386    211     1) 수사기록 상 주소
2386    211     주민등록상 주소 :

Here is the desired output:

KEY EVENT_DTL
    2143    4507    1. 변사자 정보 : kim young2013년05월
                    3. 발견장소 :
                    1) 수사기록 상 주소
                    주민등록상 주소 :
    2386    211     1. 변사자 정보 : Laa euong2014년06월
                    3. 발견장소 :
                    1) 수사기록 상 주소
                    주민등록상 주소 :

CodePudding user response：

some kind of groupBy & agg might produce desired input. (i don't have test data format to try different combinations)

df.groupby('KEY').agg(lambda x: list(set(x))).reset_index()

result:

    KEY  ...                                          EVENT_DTL
0   211  ...  [1) 수사기록 상 주소 , 1. 변사자 정보 : laa eudong2014년06월...
1  2233  ...  [1. 변사자 정보 : lee suk2014년11월 , 1) 수사기록 상 주소 , ...
2  2260  ...                                                [E]
3  4507  ...  [1) 수사기록 상 주소 , 1. 변사자 정보 : kim young2013년05월 ...
4  5294  ...  [1) 수사기록 상 주소 , 1. 변사자 정보 : kill gil2014년02월 ,...

edit:

df2 = df.groupby(['KEY','NAME'], as_index=False)['EVENT_DTL'].sum()

result:

    KEY        NAME                                          EVENT_DTL
0   211  laa eudong  1. 변사자 정보 : laa eudong2014년06월 3. 발견장소 : 1) 수사...
1  2233     lee suk  1. 변사자 정보 : lee suk2014년11월 3. 발견장소 : 1) 수사기록 ...
2  2260      No hee                                                  E
3  4507   kim young  1. 변사자 정보 : kim young2013년05월 3. 발견장소 : 1) 수사기...
4  5294    kill gil  1. 변사자 정보 : kill gil2014년02월 3. 발견장소 : 1) 수사기록...

CodePudding user response：

Have you tried the fillna method ?

Something like:

 df.EVENT_DTL = df.EVENT_DTL.fillna('1. 변사자 정보 : ' df.NAME df.FIND_DATE.astype(str).str[:4] '년' df.FIND_DATE.astype(str).str[4:6] '월' ' ' '\n3. 발견장소 : \n1) 수사기록 상 주소 \n주민등록상 주소 : ')

Your df will now be:

    KEY        NAME  FIND_DATE                                          EVENT_DTL
0  4507   kim young   20130518  1. 변사자 정보 : kim young2013년05월 \n3. 발견장소 : \n1)...
1   211  laa eudong   20140626  1. 변사자 정보 : laa eudong2014년06월 \n3. 발견장소 : \n1...
2  5294    kill gil   20140215  1. 변사자 정보 : kill gil2014년02월 \n3. 발견장소 : \n1) ...
3  2233     lee suk   20141121  1. 변사자 정보 : lee suk2014년11월 \n3. 발견장소 : \n1) 수...
4  2260      No hee   20140910                                                  E

You can now print results as such:

>>> print(df.EVENT_DTL[0])
1. 변사자 정보 : kim young2013년05월 
3. 발견장소 : 
1) 수사기록 상 주소 
주민등록상 주소 :

NB: It seems you would want to display the printed version of the string directly within your DataFrame; you could look at this answer to do so.