Pythonic Way Of Processing A File Between Two Previously Known Strings
Solution 1:
You are right that there is something not OK with having a nested loop over the same iterator. File objects are already iterators, and you can use that to your advantage. For example, to find the first line with a START
in it:
line = next(l for l in name_of_file if 'START' in l)
This will raise a StopIteration
if there is no such line. It also sets the file pointer to the beginning of the first line you care about.
Getting the last line without anything that comes after it is a bit more complicated because it's difficult to set external state in a generator expression. Instead, you can make a simple generator:
definteresting_lines(file):
ifnotnext((line for line in file if'START'in line), None):
returnfor line in file:
if'END'in line:
break
line = line.strip()
ifnot line:
continueyield line.replace(',', '').split()
The generator will yield nothing if you don't have a START
, but it will yield all the lines until the end if there is no END
, so it differs a little from your implementation. You would use the generator to replace your loop entirely:
with open(name_of_file) as file:
data = list(interesting_lines(file))
ifdata:
... # process data
Wrapping the generator in list
immediately processes it, so the lines persist even after you close the file. The iterator can be used repeatedly because at the end of your call, the file pointer will be just past the END
line:
withopen(name_of_file) as file:
for data initer(lambda: list(interesting_lines(file)), []):
# Process another data set.
The relatively lesser known form of iter
converts any callable object that accepts no arguments into an iterator. The end is reached when the callable returns the sentinel value, in this case an empty list.
Solution 2:
To perform that, you can use iter(callable, sentinel)
discussed in this post , that will read until a sentinel value is reached, in your case 'END' (after applying .strip()
).
withopen(filename) as file:
start_token = next(l for l in file if l.strip()=='START') # Used to read until the start token
result = [line.replace(',', '').split() for line initer(lambda x=file: next(x).strip(), 'END') if line]
Solution 3:
This is a mission for regular expressions re
, for example:
import re
lines = """ not this line
START
this line
this line too
END
not this one
"""
search_obj = re.search( r'START(.*)END', lines, re.S)
search_obj.groups(1)
# ('\n this line\n this line too\n ',)
The re.S
is necessary for spanning multiple lines.
Post a Comment for "Pythonic Way Of Processing A File Between Two Previously Known Strings"