2 years ago

#57478

test-img

rfengineer

how to read large csvs in multiple zipfiles very quickly using pandas?

I'm using below code to read multiple multiple CSV's in multiple Zipfiles (one CSV per Zipfile) and data is huge (each csv is 1.5GB and I have more than 30 zipped CSV's) and that's why I prefer not unzipping them. however, it gets very slow when I want to concat many of them. is there a more effecient code to make the process quicker?

     import os
     import glob
     import pandas as pd
     os.chdir(r"path") 
     extension = 'zip'
     all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
     combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])

python

pandas

csv

zip

glob

0 Answers

Your Answer

Accepted video resources