IndexedFile — File Base class

This is the base class used for (large) file handling in the pylam framework.

It provides a random access to a file based on the line index. This is realized by scanning the file at initialization and building a table of byte offsets. Therefore, if one needs access to line(s) at the middle or at the end of a large file, one did not need to go through all previous lines until one reaches the line(s) of interest.

Note

An object must be initiated with an input file, which will be scanned to generate an index initially. Hereafter the file will be closed and re-opened for each access!

The core methods are IndexedFile.getLines() and IndexedFile.getLine() to retrieve certain line(s) from a file as a string, including the newline character. The python function len() will return the length of the file as number of lines.

Example input file test.txt:

  _     _       Line index 0
 (_) __| |_  __ Line index 1
 | |/ _` \ \/ / Line index 2
 | | (_| |>  <  Line index 3
 |_|\__,_/_/\_\ Line index 4
   __ _ _       Line index 5
  / _(_) | ___  Line index 6
 | |_| | |/ _ \ Line index 7
 |  _| | |  __/ Line index 8
 |_| |_|_|\___| Line index 9
*eof*           Line index 10
>>> import pylam.base
>>> ifile = pylam.base.IndexedFile('test.txt')
>>> print len(ifile)  # returns the total number of lines
11
>>> lines = ifile.getLines(5, 9)  # get lines as string (incl. new lines!)
>>> print lines
   __ _ _       Line index 5
  / _(_) | ___  Line index 6
 | |_| | |/ _ \ Line index 7
 |  _| | |  __/ Line index 8
 |_| |_|_|\___| Line index 9
>>> print type(lines)
<type 'str'>

One can although iterate over an IndexedFile object:

>>> for line in ifile:
>>>     print line[0:16]
  _     _
 (_) __| |_  __
 | |/ _` \ \/ /
 | | (_| |>  <
 |_|\__,_/_/\_\
   __ _ _
  / _(_) | ___
 | |_| | |/ _ \
 |  _| | |  __/
 |_| |_|_|\___|
*eof*

But remember, the file will be re-opened for each access!

The method IndexedFile._indexFile() (which is called in __init__()) calls for each line IndexedFile._parseLine(). This method is here a dummy. By replacing this method in a derived class one can easily implement further parsing.


class pylam.base.IndexedFile(filename)[source]

Bases: object

Generic class for pre-indexed file objects.

Parameters:filename (str) – file name
Returns:indexed file object
Return type:IndexedFile
_indexFile()[source]

Generates the byte offset table. Method is called in __init__().

_parseLine(line, no)[source]

Method which will be called in _indexFile() for each line of the the File.

Parameters:
  • line (str) – line to parse
  • no (int) – number of the line
_getOffsets(startLineIndex, endLineIndex)[source]
Parameters:
  • startLineIndex (int) – index of first line
  • endLineIndex (int) – index of last line (included!)
Returns:

start byte offset and byte length

Return type:

tuple(int, int)

getLines(startLineIndex, endLineIndex)[source]

Returns a part of the file as a string.

Parameters:
  • startLineIndex (int) – index of first line
  • endLineIndex (int) – index of last line (included!)
Returns:

part of the file

Return type:

str

getLine(lineIndex)[source]

Returns the line with a given index (0,1,..) as a string.

Parameters:lineIndex (int) – index of line in file
Returns:line
Return type:str
__len__()[source]

Returns the total number of lines in the file. (support for len()

next()[source]

Returns the next line from the file.