Reading Selected Column Only From Csv File, When All Other Columns Are Guaranteed To Be Identical
Solution 1:
This is a one-liner with pandas.read_csv(). And we can even drop the quoting too:
import pandas as pd
csva = pd.read_csv('a.csv', header=0, quotechar="'", delim_whitespace=True)
csva['ratio']
00.0610.8820.0130.02
Name: ratio, dtype: float64
A couple of points:
- actually your separator is comma + whitespace. In that sense it's not plain-vanilla CSV. See "How to make separator in read_csv more flexible?"
- note we dropped the quoting on numeric fields, by setting
quotechar="'"
- if you really insist on saving memory (don't), you can drop all other columns of
csva
than 'ratio', after you do the read_csv. See the pandas doc.
Solution 2:
First put it in English terms.
You have to read all those other fields from somewhere, so it might as well be from the first row.
Then, having done that, you need to read the last column from each subsequent row and pack it onto the end of the new row, while ignoring the rest.
So, to turn that into Python:
withopen(outpath, 'wb') as outfile:
writer = csv.writer(outfile)
for inpath in paths:
withopen(inpath, 'rb') as infile:
reader = csv.reader(infile)
# Read all values (including the ratio) from first row
new_row = next(reader)
# For every subsequent row...for row in reader:
# ... read the ratio, pack it on, ignore the rest
new_row.append(row[-1])
writer.writerow(new_row)
I'm not sure the comments actually add anything; I think my Python is easier to follow than my English. :)
It's worth knowing that what you're trying to do here is called "denormalization". From what I can tell, your data will end up with an arbitrary number of ratio
columns per row, all of which have the same "meaning", so each row isn't really a value anymore, but a collection of values.
Denormalization is generally considered bad, for a variety of reasons. There are cases where denormalized data is easier or faster to work with—as long as you know that you're doing it, and why, it can be a useful thing to do. Wikipedia has a nice article on database normalization that explains the issues; you might want to read it so you understand what you're doing here, and can make sure that it's the right thing to do.
Post a Comment for "Reading Selected Column Only From Csv File, When All Other Columns Are Guaranteed To Be Identical"