Home > database >  Pandas Add column and fill it with complex Concatenate
Pandas Add column and fill it with complex Concatenate

Time:06-28

I have this Excel formula in A2 : =IF(B1=1;CONCAT(D1:Z1);"Null")

All cells are string or integer but some are empty. I filled the empty one with "null"

I've tried to translate it with pandas, and so far I wrote this :

'''

import pandas as pd
df = pd.read_table('C:/*path*/001.txt', sep=';', header=0, dtype=str)

rowcount  = 0
for row in df: 
  rowcount = 1    
n = rowcount
m = len(df)

df['A']=""

for i in range(1,n):
     if df[i-1]["B"]==1:
         for k in range(2,m):
             if df[i][k]!="Null"
                 df[i]['A'] =df[i][k]

''' I can't find something close enough to my problem in questions, anyone can help?

CodePudding user response:

I not sure you really expecting for this. If you need to fill empty cell with 'null' string in dataframe. You can use this

df.fillna('null', inplace=True)

If you provide the expected output with your input file. May helpful for the contributors.

CodePudding user response:

Test dataframe:

df = pd.DataFrame({
    "b":[1,0,1],
    "c":["dummy", "dummy", "dummy"],
    "d":["red", "green", "blue"],
    "e":["-a", "-b", "-c"]
})

First step: add a new column and fill with NULL.

df["concatenated"] = "NULL"

Second step: filter by where column B is 1, and then set the value of the new column to the concatenation of columns D to Z.

df["concatenated"][df["b"]==1] = df[sub_columns].sum(axis=1)
df

Output:

enter image description here

EDIT: I notice there is an offset in your excel formula. Not sure if this is deliberate, but experiment with df.shift(-1) if so.

CodePudding user response:

There's a lot to unpack here.

Firstly, len(df) gives us the row count. In your code, n and m will be one and the same.

Secondly, please never do chain indexing in pandas unless you absolutely have to. There's a number of reasons not to, one of them being that it's easy to make a mistake; also, assignment can fail. Here, we have a default range index, so we can use df.loc[i-1, 'B'] in place of df[i - 1]["B"].

Thirdly, the dtype is str, so please use =='1' rather than ==1.

If I understand your problem correctly, the following code should help you:

In [1]: import pandas as pd                                                                                                                                                                   

In [2]: df = pd.DataFrame({
    'B': ['1','2','0','1'],
    'C': ['first', 'second', 'third', 'fourth'],
    'D': ['-1', '-2', '-3', '-4']
})                                                                

In [3]: RELEVANT_COLUMNS = ['C', 'D'] # You can also extract them in any automatic way you please

In [4]: df["A"] = ""                                                                                                                                                                                                                                                                                                                           

In [5]: df.loc[df['B'] == '1', 'A'] = df.loc[df['B'] == '1', RELEVANT_COLUMNS].sum(axis=1)                                                                                                             

In [6]: df                                                                                                                                                                                     
Out[6]: 
   B       C   D         A
0  1   first  -1   first-1
1  2  second  -2          
2  0   third  -3          
3  1  fourth  -4  fourth-4

We note which columns to concat (In [3]) (we do not want to make the mistake of adding a column later on and using it. Here if we add 'A' it doesn't hurt, because it's full of empty strings. But it's more manageable to save the columns we concat.

We then add the column with empty strings (In [4]) (if we skip this step, we'll get NaNs instead of empty strings for the records where B does not equal 1).

The In [5] uses pandas' boolean indexing (through the Series to scalar equality operator) to limit our scope to where column B equals 1, and then we pull up the columns to concat and do just that, using the an axis-reducing sum operation.

  • Related