Home > Blockchain >  combine rows of single column over multiple .csv files in pandas
combine rows of single column over multiple .csv files in pandas

Time:03-07

I have a bunch of .csv files with the same column headers and data types in the columns.

c1   c2   c3 
1    5   words
2    6   words
3    7   words
4    8   words

is there a way to combine all the text in c3 in each .csv file then combine them into one csv?

I combined them this way

path = r'C:\\Users\\...\**\*.csv'

all_rec = iglob(path, recursive=True)     
dataframes = (pd.read_csv(f) for f in all_rec)
big_dataframe = pd.concat(dataframes, ignore_index=True)

i'm not sure how to combine the text rows first then bring them together.

CodePudding user response:

There are many way to do it. One way:

path = r'C:\\Users\\...\**\*.csv'

all_rec = iglob(path, recursive=True)

# Extract only c3 column from files
dataframes = {f: pd.read_csv(f, usecols=['c3']) for f in all_rec}

# Group all dataframes then combine text rows of each dataframe
big_dataframe = pd.concat(dataframes).groupby(level=0)['c3'] \
                  .apply(lambda x: ' '.join(x.tolist())).reset_index(drop=True)

Output:

>>> big_dataframe
0        words words words words
1    words2 words2 words2 words2
2    words3 words3 words3 words3
Name: c3, dtype: object
  • Related