Hello Guys How Are You All ? Hope You all are fine. Today I Have Faced How to quickly get the last line of a huge csv file (48M lines)? [duplicate] In Python. So Here I am Explain to you all the possible solutions Here.
Without Wasting your time, Lets start This Article to Solve This Error in Anaconda
How to quickly get the last line of a huge csv file (48M lines)? [duplicate] error Occurs ?
I have a csv file that grows until it reaches approximately 48M of lines.
Before adding new lines to it, I need to read the last line.
I tried the code below, but it got too slow and I need a faster alternative:
def return_last_line(filepath): with open(filepath,'r') as file: for x in file: pass return x return_last_line('lala.csv')
How to solve How to quickly get the last line of a huge csv file (48M lines)? [duplicate]
Question: How to solve quickly get the last line of a huge csv file (48M lines)? [duplicate]
Answer : Here is my take, in python: I created a function that lets you choose how many last lines, because the last lines may be empty.
Here is my take, in python: I created a function that lets you choose how many last lines, because the last lines may be empty.
def get_last_line(file, how_many_last_lines = 1): # open your file using with: safety first, kids! with open(file, 'r') as file: # find the position of the end of the file: end of the file stream end_of_file = file.seek(0,2) # set your stream at the end: seek the final position of the file file.seek(end_of_file) # trace back each character of your file in a loop n = 0 for num in range(end_of_file+1): file.seek(end_of_file - num) # save the last characters of your file as a string: last_line last_line = file.read() # count how many '\n' you have in your string: # if you have 1, you are in the last line; if you have 2, you have the two last lines if last_line.count('\n') == how_many_last_lines: return last_line get_last_line('lala.csv', 2)
This lala.csv has 48 million lines, such as in your example. It took me 0 seconds to get the last line.
Here is code for finding the last line of a file
mmap, and it should work on Unixen and derivatives and Windows alike (I’ve tested this on Linux only, please tell me if it works on Windows too ;), i.e. pretty much everywhere where it matters. Since it uses memory mapped I/O it could be expected to be quite performant.
It expects that you can map the entire file into the address space of a processor – should be OK for 50M file everywhere but for 5G file you’d need a 64-bit processor or some extra slicing.
import mmap def iterate_lines_backwards(filename): with open(filename, "rb") as f: # memory-map the file, size 0 means whole file with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm: start = len(mm) while start > 0: start, prev = mm.rfind(b"\n", 0, start), start slice = mm[start + 1:prev + 1] # if the last character in the file was a '\n', # technically the empty string after that is not a line. if slice: yield slice.decode() def get_last_nonempty_line(filename): for line in iterate_lines_backwards(filename): if stripped := line.rstrip("\r\n"): return stripped print(get_last_nonempty_line("datafile.csv"))
As a bonus there is a generator
iterate_lines_backwards that would efficiently iterate over the lines of a file in reverse for any number of lines:
print("Iterating the lines of datafile.csv backwards") for l in iterate_lines_backwards("datafile.csv"): print(l, end="")
This is generally a rather tricky thing to do. A very efficient way of getting a chunk that includes the last lines is the following:
import os def get_last_lines(path, offset=500): """ An efficient way to get the last lines of a file. IMPORTANT: 1. Choose offset to be greater than max_line_length * number of lines that you want to recover. 2. This will throw an os.OSError if the file is shorter than the offset. """ with path.open("rb") as f: f.seek(-offset, os.SEEK_END) while f.read(1) != b"\n": f.seek(-2, os.SEEK_CUR) return f.readlines()
You need to know the maximum line length though and ensure that the file is at least one offset long!
To use it, do the following:
from pathlib import Path n_last_lines = 10 last_bit_of_file = get_last_lines(Path("/path/to/my/file")) real_last_n_lines = last_bit_of_file[-10:]
Now finally you need to decode the binary to strings:
real_last_n_lines_non_binary = [x.decode() for x in real_last_n_lines]
Probably all of this could be wrapped in one more convenient function.
You could additionally store the last line in a separate file, which you update whenever you add new lines to the main file.
If you are running your code in a Unix based environment, you can execute
tail shell command from Python to read the last line:
import subprocess subprocess.run(['tail', '-n', '1', '/path/to/lala.csv'])
This works well for me:
from file_read_backwards import FileReadBackwards with FileReadBackwards("/tmp/file", encoding="utf-8") as frb: # getting lines by lines starting from the last line up for l in frb: print(l)
It’s all About this issue. Hope all solution helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which solution worked for you?