IndexedFile — File Base class¶
This is the base class used for (large) file handling in the pylam framework.
It provides a random access to a file based on the line index. This is realized by scanning the file at initialization and building a table of byte offsets. Therefore, if one needs access to line(s) at the middle or at the end of a large file, one did not need to go through all previous lines until one reaches the line(s) of interest.
Note
An object must be initiated with an input file, which will be scanned to generate an index initially. Hereafter the file will be closed and re-opened for each access!
The core methods are IndexedFile.getLines() and IndexedFile.getLine() to retrieve certain line(s) from a
file as a string, including the newline character.
The python function len() will return the length of the file as number of lines.
Example input file test.txt:
_ _ Line index 0
(_) __| |_ __ Line index 1
| |/ _` \ \/ / Line index 2
| | (_| |> < Line index 3
|_|\__,_/_/\_\ Line index 4
__ _ _ Line index 5
/ _(_) | ___ Line index 6
| |_| | |/ _ \ Line index 7
| _| | | __/ Line index 8
|_| |_|_|\___| Line index 9
*eof* Line index 10
>>> import pylam.base
>>> ifile = pylam.base.IndexedFile('test.txt')
>>> print len(ifile) # returns the total number of lines
11
>>> lines = ifile.getLines(5, 9) # get lines as string (incl. new lines!)
>>> print lines
__ _ _ Line index 5
/ _(_) | ___ Line index 6
| |_| | |/ _ \ Line index 7
| _| | | __/ Line index 8
|_| |_|_|\___| Line index 9
>>> print type(lines)
<type 'str'>
One can although iterate over an IndexedFile object:
>>> for line in ifile:
>>> print line[0:16]
_ _
(_) __| |_ __
| |/ _` \ \/ /
| | (_| |> <
|_|\__,_/_/\_\
__ _ _
/ _(_) | ___
| |_| | |/ _ \
| _| | | __/
|_| |_|_|\___|
*eof*
But remember, the file will be re-opened for each access!
The method IndexedFile._indexFile() (which is called in __init__()) calls for each line IndexedFile._parseLine().
This method is here a dummy. By replacing this method in a derived class one can easily implement further parsing.
-
class
pylam.base.IndexedFile(filename)[source]¶ Bases:
objectGeneric class for pre-indexed file objects.
Parameters: filename (str) – file name Returns: indexed file object Return type: IndexedFile -
_parseLine(line, no)[source]¶ Method which will be called in
_indexFile()for each line of the the File.Parameters: - line (str) – line to parse
- no (int) – number of the line
-
_getOffsets(startLineIndex, endLineIndex)[source]¶ Parameters: - startLineIndex (int) – index of first line
- endLineIndex (int) – index of last line (included!)
Returns: start byte offset and byte length
Return type: tuple(int, int)
-
getLines(startLineIndex, endLineIndex)[source]¶ Returns a part of the file as a string.
Parameters: - startLineIndex (int) – index of first line
- endLineIndex (int) – index of last line (included!)
Returns: part of the file
Return type: str
-