file io - Python gzip.open .tell() has a linear increasing factor making it slow -
using python 3.3.5, have code looks like:
with gzip.open(fname, mode='rb') fh: fh.seek(savedpos) line in fh: # work done savedpos = fh.tell()
the work being done on each row quite taxing on system, wasn't hoping great numbers. threw in debug counter , got following result:
48 rows/sec 28 rows/sec 19 rows/sec 15 rows/sec 13 rows/sec 13 rows/sec 9 rows/sec 10 rows/sec 9 rows/sec 9 rows/sec 8 rows/sec 8 rows/sec 8 rows/sec 8 rows/sec 7 rows/sec 7 rows/sec 7 rows/sec 7 rows/sec 5 rows/sec ...
which tells me off, put fh.tell()
in debug-counter/timer function, making fh.tell()
executed once second , got stable 65 rows/sec.
am off shelf or shouldn't fh.tell() extremely quick? or side-affect of gzip alone?
i used store file-position manually bugged out due different file-endings, encoding issues etc figured fh.tell()
more accurate.
are there alternatives or can speed fh.tell() how?
my experience zlib (albeit using c rather python, suspect issue same) seeking slow. zlib doesn't keep track of in file is, if seek has uncompress beginning in order count how many uncompressed bytes forward should seek to.
in other words, reading or writing sequentially fine. if have seek, you're in world of hurt.
Comments
Post a Comment