I have a csv that looks like this:
header1 header2 header3 header4 header5 header6
row1 row1 row1 row1 row1 row1
row2 row2 row2 row2 row2 row2
row3 row3 row3 row3 row3 row3
What I want to achieve is to merge each row values into a string separated by #. For example the output would be:
row1#row1#row1#row1#row1#row1
row2#row2#row2#row2#row2#row2
row3#row3#row3#row3#row3#row3
I have already done this using this code:
df = pd.read_csv("test.csv",
na_filter=False)
test = df.stack().groupby(level=0).apply('#'.join)
print(test.to_dict())
The only issue with the code above is if a row has an empty value it would still append a "#" to the output making it look like this, assuming row 1 header 5 is empty:
row1#row1#row1#row1##row1
Where it should be like this if row 1 header 5 is empty:
row1#row1#row1#row1#row1
Anyone knows how can I fix this?
CodePudding user response:
Here is necessary replace emty strings to NaN
s, so DataFrame.stack
by default remove this empty values:
print (df)
header1 header2 header3 header4 header5 header6
0 row1 row1 row1 row1 row1
1 row2 row2 row2
2 row3 row3 row3 row3 row3 row3
test = df.replace('', np.nan).stack().groupby(level=0).apply('#'.join)
print(test.to_dict())
{0: 'row1#row1#row1#row1#row1',
1: 'row2#row2#row2',
2: 'row3#row3#row3#row3#row3#row3'}
Or use:
test = df.replace('', np.nan).apply(lambda x: '#'.join(x.dropna()), axis=1)
print(test.to_dict())
{0: 'row1#row1#row1#row1#row1',
1: 'row2#row2#row2',
2: 'row3#row3#row3#row3#row3#row3'}
Or:
test = df.apply('#'.join, axis=1).str.replace('[#] ','#', regex=True)
print(test.to_dict())
{0: 'row1#row1#row1#row1#row1',
1: 'row2#row2#row2',
2: 'row3#row3#row3#row3#row3#row3'}
Thanks @Corralien for another solution:
df.apply(lambda x: '#'.join(i for i in x if i != ''), axis=1).to_dict()