python - Caching CSV-read data with pandas for multiple runs -


i'm trying apply machine learning (python scikit-learn) large data stored in csv file 2.2 gigabytes.

as partially empirical process need run script numerous times results in pandas.read_csv() function being called on , on again , takes lot of time.

obviously, time consuming guess there must way make process of reading data faster - storing in different format or caching in way.

code example in solution great!

i store parsed dfs in 1 of following formats:

all of them fast

ps it's important know kind of data (what dtypes) going store, because might affect speed dramatically


Comments

Popular posts from this blog

php - How to add and update images or image url in Volusion using Volusion API -

Laravel mail error `Swift_TransportException in StreamBuffer.php line 269: Connection could not be established with host smtp.gmail.com [ #0]` -

c# SetCompatibleTextRenderingDefault must be called before the first -