Extract A Column From A String In Python

December 12, 2023 Post a Comment

I run a command remotely using Python and this is the output I get: Vserver Volume Aggregate State Type Size Available Used% --------- ------------ ---------

Solution 1:

Reading Columns from a File

A text file is inherently row oriented, when you open it in a text editor you see, and you can operate on, lines of text.

This inherent structure is reflected in an idiomatic way of slurping a text file content using python:

data = [line for line in file(fname)]

data being a list of strings corresponding to the rows of the file.

Sometimes the text is more structured and you can see that there is a columnar organization in it. For the sake of simplicity, say that we have

an initial line of headers,
possibly some line of junk and
a number of lines containing the actual data,

moreover we assume that every relevant line contains the same number of columns.

An idiom that you can use is

data = [line.split() for line in file(fname)]

here data is now a list of lists, one sublist for each row of the file, each sublist a list of the strings obtained splitting column-wise a row.

Reordering in Columns

While you can access every single data item by data[row][column] it may be more convenient to refer to data using the headers, as in data['Aggregate'][5]... In python, to address data using a string you usually use a dictionary, and you can build a dictionary using what is called a dictionary comprehension

Baca Juga

n = 2# in your example datadata_by_rows = [line.split() for line in file(fname)]
data_by_cols = {col[0]:list(col[n:]) for col in zip(*data_by_rows)}

This works because the idiom zip(*list_of_rows) returns you a list_of_cols.

>>>a = [[1,2,3],[10,20,30]]>>>zip(*a)
[(1, 10), (2, 20), (3, 30)]
>>>

Moving On

What we have seen is simple and convenient to use if the file format is simple and the manipulations you want to do are not involved. For more complex formats and/or manipulation requirements, python offers a number of options, either in the standard library

the csv module eases the task of reading (and writing as well) comma(/tab) separated values files,

or as optional maodules

the numpy module, aimed to numerical analysis, has facilites for slurping all data from a text file and putting them in an array structure,
the pandas module, aimed at data analysis and modeling, built on numpy, also has facilities to turn a structured text file into a dataframe structure.

Solution 2:

There are two handy functions for what you want: readlines() splits a files in lines and str.split() splits a string (by default, using any whitespace as separator).

with open("input.txt") as f:
     lines = f.readlines()

for line inlines[2:]:
     columns = line.split()
     print(columns[1])

An alternative way to it without using readlines() would be:

with open("input.txt") as f:
     content = f.read()  # does not detect lineslines = content.split("\n")
for line inlines[2:]:
     columns = line.split()
     print(columns[1])

Finally, you may be handling files whose line termination is either "\n", (GNU/Linux), "\r\n" (Windows) or "\r" (Mac OS). Then you have to use the re module:

with open("input.txt") as f:
     content = f.read()  # does not detect lineslines = re.split("\r?\n?", content)
for line inlines[2:]:
     columns = line.split()
     print(columns[1])

lacucinadiadine