So,I want to concatenate three columns in pandas by taking first 3 letters from Column1 and first 4 letters from column2 and all 4 digits from column3 and store them in column4 but whenever there is Nan in even one of column 1,2 or 3,I want the output as Nan or Blank like below.Even if there are less letters then concatenate but even if there is one Nan then don't.
Column1 Column2 Column3 Column4
Abcdef cdefg 1996 ABC|CDEF|1996
ab wer 1998 AB|WER|1998
ab NaN 1998 NaN/Blank
NaN NaN 2000 NaN/Blank
abcd weet NaN NaN/Blank
qwwer ram2e 12345 QWW|RAM2|1234
I tried the following code but its very basic and not giving desired output.Please suggest a better alternative.Thanks.
df1['column4'] = df1['column1'].str[:3] df1['column2'].str[:4] df1['column3'].map(str)
CodePudding user response:
You can join values with replace nan
to None
for columns from numeric values, also is removed last .0
:
df1['column4'] = (df1['Column1'].str[:3].str.upper() '|'
df1['Column2'].str[:4].str.upper() '|'
df1['Column3'].astype(str).replace({'^nan$':None, '\.0$':''}, regex=True))
print (df1)
Column1 Column2 Column3 Column4 column4
0 Abcdef cdefg 1996.0 ABC|CDEF|1996 ABC|CDEF|1996
1 ab wer 1998.0 AB|WER|1998 AB|WER|1998
2 ab NaN 1998.0 NaN/Blank NaN
3 NaN NaN 2000.0 NaN/Blank NaN
4 abcd weet NaN NaN/Blank NaN
5 qwwer ram2e 12345.0 QWW|RAM2|1234 QWW|RAM2|12345
CodePudding user response:
Just concatenating like this should work. Any NaN columns should cause Column4 to have NaN too.
df['Column4'] = df['Column1'].str[:3].str.upper() '|' df['Column2'].str[:4].str.upper() '|' df['Column3'].str[:4].str.upper()
CodePudding user response:
In my opinion you are on the right way, I don't understand what is the problem with your output, but it could be because column3 is stored as a float. See this code:
data1 = {'column1': ['Abcdef', 'ab', 'ab', np.nan, 'abdc', 'qwwer'], 'column2': ['cdefg', 'wer', np.nan, np.nan, 'weet' , 'ram2e'], 'column3': [1996, 1998, 1998, 2000, np.nan, 12345]}
df1 = pd.DataFrame(data1)
df1['column3'] = df1['column3'].astype(str).replace('\.0', '', regex=True)
df1['column4'] = df1['column1'].str[:3] df1['column2'].str[:4] df1['column3']
print(df1)
output:
column1 column2 column3 column4
0 Abcdef cdefg 1996 Abccdef1996
1 ab wer 1998 abwer1998
2 ab NaN 1998 NaN
3 NaN NaN 2000 NaN
4 abdc weet nan abdweetnan
5 qwwer ram2e 12345 qwwram212345