Home > database >  How to concatenate columns in pandas having NAN?
How to concatenate columns in pandas having NAN?

Time:11-09

So,I want to concatenate three columns in pandas by taking first 3 letters from Column1 and first 4 letters from column2 and all 4 digits from column3 and store them in column4 but whenever there is Nan in even one of column 1,2 or 3,I want the output as Nan or Blank like below.Even if there are less letters then concatenate but even if there is one Nan then don't.

Column1 Column2 Column3 Column4
Abcdef  cdefg   1996    ABC|CDEF|1996
ab      wer     1998    AB|WER|1998
ab      NaN     1998    NaN/Blank
NaN     NaN     2000    NaN/Blank
abcd    weet    NaN     NaN/Blank
qwwer   ram2e   12345   QWW|RAM2|1234

I tried the following code but its very basic and not giving desired output.Please suggest a better alternative.Thanks.

df1['column4'] = df1['column1'].str[:3] df1['column2'].str[:4] df1['column3'].map(str)

CodePudding user response:

You can join values with replace nan to None for columns from numeric values, also is removed last .0:

df1['column4'] = (df1['Column1'].str[:3].str.upper()   '|'  
                  df1['Column2'].str[:4].str.upper()   '|'  
                  df1['Column3'].astype(str).replace({'^nan$':None, '\.0$':''}, regex=True))
print (df1)
  Column1 Column2  Column3        Column4         column4
0  Abcdef   cdefg   1996.0  ABC|CDEF|1996   ABC|CDEF|1996
1      ab     wer   1998.0    AB|WER|1998     AB|WER|1998
2      ab     NaN   1998.0      NaN/Blank             NaN
3     NaN     NaN   2000.0      NaN/Blank             NaN
4    abcd    weet      NaN      NaN/Blank             NaN
5   qwwer   ram2e  12345.0  QWW|RAM2|1234  QWW|RAM2|12345

CodePudding user response:

Just concatenating like this should work. Any NaN columns should cause Column4 to have NaN too.

df['Column4'] = df['Column1'].str[:3].str.upper()   '|'   df['Column2'].str[:4].str.upper()   '|'   df['Column3'].str[:4].str.upper()

CodePudding user response:

In my opinion you are on the right way, I don't understand what is the problem with your output, but it could be because column3 is stored as a float. See this code:

data1 = {'column1': ['Abcdef', 'ab', 'ab', np.nan, 'abdc', 'qwwer'], 'column2': ['cdefg', 'wer', np.nan, np.nan, 'weet' , 'ram2e'], 'column3': [1996, 1998, 1998, 2000, np.nan, 12345]}
df1 = pd.DataFrame(data1)
df1['column3'] = df1['column3'].astype(str).replace('\.0', '', regex=True)
df1['column4'] = df1['column1'].str[:3] df1['column2'].str[:4] df1['column3']
print(df1)

output:

  column1 column2 column3       column4
0  Abcdef   cdefg    1996   Abccdef1996
1      ab     wer    1998     abwer1998
2      ab     NaN    1998           NaN
3     NaN     NaN    2000           NaN
4    abdc    weet     nan    abdweetnan
5   qwwer   ram2e   12345  qwwram212345
  • Related