Wednesday, 30 August 2017

python - How to read a large file line by line



I want to iterate over each line of an entire file. One way to do this is by reading the entire file, saving it to a list, then going over the line of interest. This method uses a lot of memory, so I am looking for an alternative.



My code so far:



for each_line in fileinput.input(input_file):

do_something(each_line)

for each_line_again in fileinput.input(input_file):
do_something(each_line_again)


Executing this code gives an error message: device active.



Any suggestions?




The purpose is to calculate pair-wise string similarity, meaning for each line in file, I want to calculate the Levenshtein distance with every other line.


Answer



The correct, fully Pythonic way to read a file is the following:



with open(...) as f:
for line in f:
# Do something with 'line'


The with statement handles opening and closing the file, including if an exception is raised in the inner block. The for line in f treats the file object f as an iterable, which automatically uses buffered I/O and memory management so you don't have to worry about large files.





There should be one -- and preferably only one -- obvious way to do it.



No comments:

Post a Comment

casting - Why wasn't Tobey Maguire in The Amazing Spider-Man? - Movies & TV

In the Spider-Man franchise, Tobey Maguire is an outstanding performer as a Spider-Man and also reprised his role in the sequels Spider-Man...