python - Caching CSV-read data with pandas for multiple runs -


i'm trying apply machine learning (python scikit-learn) large data stored in csv file 2.2 gigabytes.

as partially empirical process need run script numerous times results in pandas.read_csv() function being called on , on again , takes lot of time.

obviously, time consuming guess there must way make process of reading data faster - storing in different format or caching in way.

code example in solution great!

i store parsed dfs in 1 of following formats:

all of them fast

ps it's important know kind of data (what dtypes) going store, because might affect speed dramatically


Comments

Popular posts from this blog

php - How to add and update images or image url in Volusion using Volusion API -

javascript - jQuery UI Splitter/Resizable for unlimited amount of columns -

javascript - IE9 error '$'is not defined -