Extract A Column From A String In Python
Solution 1:
Reading Columns from a File
A text file is inherently row oriented, when you open it in a text editor you see, and you can operate on, lines of text.
This inherent structure is reflected in an idiomatic way of slurping a text file content using python:
data = [line for line in file(fname)]
data
being a list of strings corresponding to the rows of the file.
Sometimes the text is more structured and you can see that there is a columnar organization in it. For the sake of simplicity, say that we have
- an initial line of headers,
- possibly some line of junk and
- a number of lines containing the actual data,
moreover we assume that every relevant line contains the same number of columns.
An idiom that you can use is
data = [line.split() for line in file(fname)]
here data
is now a list of lists, one sublist for each row of the file, each sublist a list of the strings obtained splitting column-wise a row.
Reordering in Columns
While you can access every single data item by data[row][column]
it may be more convenient to refer to data using the headers, as in data['Aggregate'][5]
... In python, to address data using a string you usually use a dictionary, and you can build a dictionary using what is called a dictionary comprehension
n = 2# in your example datadata_by_rows = [line.split() for line in file(fname)]
data_by_cols = {col[0]:list(col[n:]) for col in zip(*data_by_rows)}
This works because the idiom zip(*list_of_rows)
returns you a list_of_cols
.
>>>a = [[1,2,3],[10,20,30]]>>>zip(*a)
[(1, 10), (2, 20), (3, 30)]
>>>
Moving On
What we have seen is simple and convenient to use if the file format is simple and the manipulations you want to do are not involved. For more complex formats and/or manipulation requirements, python offers a number of options, either in the standard library
- the
csv
module eases the task of reading (and writing as well) comma(/tab) separated values files,
or as optional maodules
- the
numpy
module, aimed to numerical analysis, has facilites for slurping all data from a text file and putting them in anarray
structure, - the
pandas
module, aimed at data analysis and modeling, built onnumpy
, also has facilities to turn a structured text file into a dataframe structure.
Solution 2:
There are two handy functions for what you want: readlines()
splits a files in lines and str.split()
splits a string (by default, using any whitespace as separator).
with open("input.txt") as f:
lines = f.readlines()
for line inlines[2:]:
columns = line.split()
print(columns[1])
An alternative way to it without using readlines()
would be:
with open("input.txt") as f:
content = f.read() # does not detect lineslines = content.split("\n")
for line inlines[2:]:
columns = line.split()
print(columns[1])
Finally, you may be handling files whose line termination is either "\n", (GNU/Linux), "\r\n" (Windows) or "\r" (Mac OS). Then you have to use the re
module:
with open("input.txt") as f:
content = f.read() # does not detect lineslines = re.split("\r?\n?", content)
for line inlines[2:]:
columns = line.split()
print(columns[1])
Post a Comment for "Extract A Column From A String In Python"