python - Caching CSV-read data with pandas for multiple runs -


i'm trying apply machine learning (python scikit-learn) large data stored in csv file 2.2 gigabytes.

as partially empirical process need run script numerous times results in pandas.read_csv() function being called on , on again , takes lot of time.

obviously, time consuming guess there must way make process of reading data faster - storing in different format or caching in way.

code example in solution great!

i store parsed dfs in 1 of following formats:

all of them fast

ps it's important know kind of data (what dtypes) going store, because might affect speed dramatically


Comments

Popular posts from this blog

sql - can we replace full join with union of left and right join? why not? -

javascript - Parallax scrolling and fixed footer code causing width issues -

iOS: Performance of reloading UIImage(name:...) -