Home > database >  Pandas - Merge each row values into a string ignoring the empty values
Pandas - Merge each row values into a string ignoring the empty values

Time:10-07

I have a csv that looks like this:

header1 header2 header3 header4 header5 header6
row1    row1    row1    row1    row1    row1
row2    row2    row2    row2    row2    row2
row3    row3    row3    row3    row3    row3

What I want to achieve is to merge each row values into a string separated by #. For example the output would be:

row1#row1#row1#row1#row1#row1
row2#row2#row2#row2#row2#row2
row3#row3#row3#row3#row3#row3

I have already done this using this code:

df = pd.read_csv("test.csv",
                 na_filter=False)
test = df.stack().groupby(level=0).apply('#'.join)
print(test.to_dict())

The only issue with the code above is if a row has an empty value it would still append a "#" to the output making it look like this, assuming row 1 header 5 is empty:

row1#row1#row1#row1##row1

Where it should be like this if row 1 header 5 is empty:

row1#row1#row1#row1#row1

Anyone knows how can I fix this?

CodePudding user response:

Here is necessary replace emty strings to NaNs, so DataFrame.stack by default remove this empty values:

print (df)
  header1 header2 header3 header4 header5 header6
0    row1    row1    row1    row1            row1
1    row2                            row2    row2
2    row3    row3    row3    row3    row3    row3

test = df.replace('', np.nan).stack().groupby(level=0).apply('#'.join)
print(test.to_dict())
{0: 'row1#row1#row1#row1#row1',
 1: 'row2#row2#row2', 
 2: 'row3#row3#row3#row3#row3#row3'}

Or use:

test = df.replace('', np.nan).apply(lambda x: '#'.join(x.dropna()), axis=1)
print(test.to_dict())
{0: 'row1#row1#row1#row1#row1', 
 1: 'row2#row2#row2', 
 2: 'row3#row3#row3#row3#row3#row3'}

Or:

test = df.apply('#'.join, axis=1).str.replace('[#] ','#', regex=True)
print(test.to_dict())
{0: 'row1#row1#row1#row1#row1', 
 1: 'row2#row2#row2', 
 2: 'row3#row3#row3#row3#row3#row3'}

Thanks @Corralien for another solution:

df.apply(lambda x: '#'.join(i for i in x if i != ''), axis=1).to_dict()
  • Related