Skip to content Skip to sidebar Skip to footer

Keep Pandas Structure With Numpy/scikit Functions

I'm using the excellent read_csv()function from pandas, which gives: In [31]: data = pandas.read_csv('lala.csv', delimiter=',') In [32]: data Out[32]:

Solution 1:

This can be done by wrapping the returned data in a dataframe, with index and columns information in.

import pandas as pd
pd.DataFrame(preprocessing.scale(data), index = data.index, columns = data.columns) 

Solution 2:

A (slightly naive) way would be to store the structure of your data frame, i.e. its columns and index, separately, and then create a new data frame from your preprocessed results like so:

In [15]: data = np.zeros((2,2))

In [16]: data
Out[16]: 
array([[ 0.,  0.],
       [ 0.,  0.]])

In [17]: from pandas import DataFrame

In [21]: df  = DataFrame(data, index = ['first', 'second'], columns=['c1','c2'])

In [22]: df
Out[22]: 
        c1  c2
first00second00In [26]: i = df.index

In [27]: c = df.columns

# generate new data as a numpy arrayIn [29]: df  = DataFrame(np.random.rand(2,2), index=i, columns=c)

In [30]: df
Out[30]: 
              c1        c2
first0.8213540.936703second0.1383760.482180

As you can see in Out[22], we start off with a data frame, and then in In[29] we place some new data inside the frame, leaving the rows and columns unchanged. I am assuming your preprocessing will not shuffle the rows/ columns of the data.

Post a Comment for "Keep Pandas Structure With Numpy/scikit Functions"