Concatenate strings in multiple csv files into one datafram along x and y axis (in pandas)-CodePudding

I have a folder with many csv files. They all look similar, they all have the same names for columns and rows. They all have strings as values in their cells. I want to concatenate them along columns AND rows so that all the strings are concatenated into their respective cells.

Example:

file1.csv

0	1	2	3	4
b1	peter	house	ash	plane
b2	carl	horse	paul	knife
b3	mary	apple	linda	carrot
b4	hank	car	herb	beer

file2.csv

0	1	2	3	4
b1	mark	green	hello	band
b2	no	phone	spoon	goodbye
b3	red	cherry	charly	hammer
b4	good	yes	ok	simon

What I want is this result with no delimiter between the string values:

concatenated.csv

0	1	2	3	4
b1	peter mark	house green	ash hello	plane band
b2	carl no	horse phone	paul spoon	knife goodbye
b3	mary red	apple cherry	linda charly	carrot hammer
b4	hank good	car yes	herb ok	beer simon

And I don't know how to do this in pandas in a jupyter notebook.

I have tried a couple of things but all of them either kept a seperate set of rows or of columns.

CodePudding user response：

If this is your first dataframe:

df1_data = {
    1 : ['peter', 'carl', 'mary', 'hank'],
    2 : ['house', 'horse','apple', 'car']
}
df1 = pd.DataFrame(df1_data)
print(df1)

Output:

       1      2
0  peter  house
1   carl  horse
2   mary  apple
3   hank    car

And this is your second dataframe:

df2_data = {
    1 : ['mark', 'no',   'red',   'good'],
    2 : ['green','phone','cherry','yes' ]
}
df2 = pd.DataFrame(df2_data)
print(df2)

Output:

      1       2
0  mark   green
1    no   phone
2   red  cherry
3  good     yes

You can reach your requested dataframe like this:

df = pd.DataFrame()
df[1] = df1[1]   ' '  df2[1]
df[2] = df1[2]   ' '  df2[2]
print(df)

Output:

            1             2
0  peter mark   house green
1     carl no   horse phone
2    mary red  apple cherry
3   hank good       car yes

Loop for csv files:

Now, if you have a lot of csv files with names like file1.csv and file2.csv and so on, you can save them all in d like this:

d = {}
for i in range(1,#N): 
  d[i] = pd.read_csv('.../file' str(i) '.csv')
#N is the number of csv files. (because I started from 1, you have to add 1 to N)

And build the dataframe you want like this:

concatenated_df = pd.DataFrame()

for i in range(1,#N):
  concatenated_df[i] = d[1].iloc[:,i]   ' '   d[2].iloc[:,i]   ...
#N is the number of columns here

CodePudding user response：

If performance is not an issue, you can use pandas.DataFrame.applymap with pandas.Series.add :

out = df1[[0]].join(df1.iloc[:, 1:].applymap(lambda v: f"{v} ").add(df2.iloc[:, 1:]))

Or, for a large dataset, you can use pandas.concat with a listcomp :

out = (
        df1[[0]]
            .join(pd.concat([df1.merge(df2, on=0)
                                 .filter(regex=f"{p}_\w").agg(" ".join, axis=1)
                                 .rename(idx) for idx, p in enumerate(range(1, len(df1.columns)), start=1)],
                            axis=1))
     )

Output :

print(out)

    0           1             2             3              4
0  b1  peter mark   house green     ash hello     plane band
1  b2     carl no   horse phone    paul spoon  knife goodbye
2  b3    mary red  apple cherry  linda charly  carrot hammer
3  b4   hank good       car yes       herb ok     beer simon

CodePudding user response：

Reading many csv files into a single DF is a pretty common answer, and is the first part of your question. You can find a suitable answer here.

After that, in an effort to allow you to perform this on all of the files at the same time, you can melt and pivot with a custom agg function like so:

import glob import pandas as pd

# See the linked answer if you need help finding csv files in a different directory
all_files = glob.glob('*.csv'))
df = pd.concat((pd.read_csv(f) for f in all_files))


output = df.melt(id_vars='0')
           .pivot_table(index='0', 
                        columns='variable',
                        values='value',
                        aggfunc=lambda x: ' '.join(x))