Reading And Doing Calculation From .dat File In Python

March 17, 2024 Post a Comment

I need to read a .dat file in python which has 12 columns in total and millions of lines of rows. I need to divide column 2,3 and 4 with column 1 for my calculation. So before I lo

Solution 1:

After looking at your flash.dat file, it's clear you need to do a little clean up before you process it. The following code converts it to a CSV file:

import csv

# read flash.dat to a list of lists
datContent = [i.strip().split() for i inopen("./flash.dat").readlines()]

# write it as a new CSV filewithopen("./flash.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(datContent)

Now, use Pandas to compute new column.

import pandas as pd

defyour_func(row):
    return row['x-momentum'] / row['mass']

columns_to_keep = ['#time', 'x-momentum', 'mass']
dataframe = pd.read_csv("./flash.csv", usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)

print dataframe

Solution 2:

train=pd.read_csv("Path",sep=" ::",header=None)

Now you can access the dat file.

train.columns=["A","B","C"]# Number of columns you can see in the dat file.

then you can use this as csv files.

Solution 3:

Try something like:

datContent = [i.strip().split() for i in open("filename.dat").readlines()]

Then you'll have your data in a list.

If you want to have something more sophisticated you can use Pandas, see the linked cookbook.

Solution 4:

Consider using the general read_table() function (of which read_csv() is a special type) where pandas can easily import the specific .dat file specifying the space separator, sep='\s+'. Additionally, no defined function with apply() is needed for column by column calculation.

Below numpy is used to condition for division by zero. Also, the example .dat file's first column is #time and columns 2, 3, 4 are x-momentum, y-momentum, and mass (different expression in your code but revise as needed).

import pandas as pd
import numpy as np

columns_to_keep = ['#time', 'x-momentum', 'y-momentum', 'mass']
df = pd.read_table("flash.dat", sep="\s+", usecols=columns_to_keep)

df['mass_per_time'] = np.where(df['#time'] > 0, df['mass']/df['#time'], np.nan)
df['x-momentum_per_time'] = np.where(df['#time'] > 0, df['x-momentum']/df['#time'], np.nan)
df['y-momentum_per_time'] = np.where(df['#time'] > 0, df['y-momentum']/df['#time'], np.nan)

Solution 5:

The problem you face here is that the column header names have whitespaces in them. You need to fix/ignore that to make pandas.read_csv behave nicely. This will read the column header names into a list based on the fixed length of the field name strings:

import pandas

withopen('flash.dat') as f:
    header = f.readline()[2:-1]
    header_fixed = [header[i*23:(i+1)*23].strip() for i inrange(26)]
    header_fixed[0] = header_fixed[0][1:] # remove '#' from time# pandas doesn't handle "Infinity" properly, read Infinity as NaN, then convert back to infinity
    df = pandas.read_csv(f, sep='\s+', names=header_fixed, na_values="Infinity")
    df.fillna(pandas.np.inf, inplace=True)

# processing
df['new_column'] = df['x-momentum'] / df['mass']

lacucinadiadine