2 min read · Jan 2, 2024
--
Python provides various methods for reading files. In this post, wewill introduce a method for reading extremely large files that can be used according to project requirements.
One common approach is to use the standard file reading process in Python, which involves opening the file with the open() function and then using the readline() or readlines() methods to read the file content line by line.
If we want to read all lines at once, we can use the readlines() method. Here is an example code using the readlines() method:
def read_from_file(filename):
with open(filename, 'r') as fp:
lines = fp.readlines()
for line in lines:
# processing the contents in the file
These methods may lead to memory issues because they require loading the entire file into memory. For example, if our file size exceeds 100GB, this approach may not be suitable.
If we need to handle extremely large files, you can use the file.read() method. Unlike the previous methods, the file.read() method returns a fixed-size chunk of file content each time, rather than reading the file line by line. This approach can avoid memory issues but requires more code to handle file content chunks. Here is an example code using the file.read() method:
def read_from_file(filename, block_size=1024*8):
with open(filename, 'r') as fp:
while True:
chunk = fp.read(block_size)
if not chunk:
break
# processing the content chunk from the file
To further optimize the code, you can use generator functions to decouple the logic of data generation and consumption. Here is an example code using a generator function:
def chunked_file_reader(fp, block_size):
while True:
chunk = fp.read(block_size)
if not chunk:
break
yield chunkdef read_from_file_v2(filename, block_size=1024*8):
with open(filename, 'r') as fp:
for chunk in chunked_file_reader(fp, block_size):
# processing the content chunk from the file