Reading And Doing Calculation From .dat File In Python
Solution 1:
After looking at your flash.dat
file, it's clear you need to do a little clean up before you process it. The following code converts it to a CSV file:
import csv
# read flash.dat to a list of lists
datContent = [i.strip().split() for i inopen("./flash.dat").readlines()]
# write it as a new CSV filewithopen("./flash.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(datContent)
Now, use Pandas to compute new column.
import pandas as pd
defyour_func(row):
return row['x-momentum'] / row['mass']
columns_to_keep = ['#time', 'x-momentum', 'mass']
dataframe = pd.read_csv("./flash.csv", usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)
print dataframe
Solution 2:
train=pd.read_csv("Path",sep=" ::",header=None)
Now you can access the dat file.
train.columns=["A","B","C"]# Number of columns you can see in the dat file.
then you can use this as csv files.
Solution 3:
Try something like:
datContent = [i.strip().split() for i in open("filename.dat").readlines()]
Then you'll have your data in a list.
If you want to have something more sophisticated you can use Pandas, see the linked cookbook.
Solution 4:
Consider using the general read_table()
function (of which read_csv()
is a special type) where pandas can easily import the specific .dat file specifying the space separator, sep='\s+'
. Additionally, no defined function with apply()
is needed for column by column calculation.
Below numpy is used to condition for division by zero. Also, the example .dat file's first column is #time and columns 2, 3, 4 are x-momentum, y-momentum, and mass (different expression in your code but revise as needed).
import pandas as pd
import numpy as np
columns_to_keep = ['#time', 'x-momentum', 'y-momentum', 'mass']
df = pd.read_table("flash.dat", sep="\s+", usecols=columns_to_keep)
df['mass_per_time'] = np.where(df['#time'] > 0, df['mass']/df['#time'], np.nan)
df['x-momentum_per_time'] = np.where(df['#time'] > 0, df['x-momentum']/df['#time'], np.nan)
df['y-momentum_per_time'] = np.where(df['#time'] > 0, df['y-momentum']/df['#time'], np.nan)
Solution 5:
The problem you face here is that the column header names have whitespaces in them. You need to fix/ignore that to make pandas.read_csv
behave nicely. This will read the column header names into a list based on the fixed length of the field name strings:
import pandas
withopen('flash.dat') as f:
header = f.readline()[2:-1]
header_fixed = [header[i*23:(i+1)*23].strip() for i inrange(26)]
header_fixed[0] = header_fixed[0][1:] # remove '#' from time# pandas doesn't handle "Infinity" properly, read Infinity as NaN, then convert back to infinity
df = pandas.read_csv(f, sep='\s+', names=header_fixed, na_values="Infinity")
df.fillna(pandas.np.inf, inplace=True)
# processing
df['new_column'] = df['x-momentum'] / df['mass']
Post a Comment for "Reading And Doing Calculation From .dat File In Python"