file io - Python gzip.open .tell() has a linear increasing factor making it slow -

- March 15, 2015

using python 3.3.5, have code looks like:

with gzip.open(fname, mode='rb') fh:     fh.seek(savedpos)     line in fh:         # work done         savedpos = fh.tell()

the work being done on each row quite taxing on system, wasn't hoping great numbers. threw in debug counter , got following result:

48 rows/sec 28 rows/sec 19 rows/sec 15 rows/sec 13 rows/sec 13 rows/sec 9 rows/sec 10 rows/sec 9 rows/sec 9 rows/sec 8 rows/sec 8 rows/sec 8 rows/sec 8 rows/sec 7 rows/sec 7 rows/sec 7 rows/sec 7 rows/sec 5 rows/sec ...

which tells me off, put fh.tell() in debug-counter/timer function, making fh.tell() executed once second , got stable 65 rows/sec.

am off shelf or shouldn't fh.tell() extremely quick? or side-affect of gzip alone?

i used store file-position manually bugged out due different file-endings, encoding issues etc figured fh.tell() more accurate.

are there alternatives or can speed fh.tell() how?

my experience zlib (albeit using c rather python, suspect issue same) seeking slow. zlib doesn't keep track of in file is, if seek has uncompress beginning in order count how many uncompressed bytes forward should seek to.

in other words, reading or writing sequentially fine. if have seek, you're in world of hurt.

Search This Blog

Hide

file io - Python gzip.open .tell() has a linear increasing factor making it slow -

Comments

Post a Comment

Popular posts from this blog

java - Oracle EBS .ClassNotFoundException: oracle.apps.fnd.formsClient.FormsLauncher.class ERROR -

c# - how to use buttonedit in devexpress gridcontrol -

HTML pattern attribute for email validation -