Keep Pandas Structure With Numpy/scikit Functions
I'm using the excellent read_csv()function from pandas, which gives: In [31]: data = pandas.read_csv('lala.csv', delimiter=',') In [32]: data Out[32]:
Solution 1:
This can be done by wrapping the returned data in a dataframe, with index
and columns
information in.
import pandas as pd
pd.DataFrame(preprocessing.scale(data), index = data.index, columns = data.columns)
Solution 2:
A (slightly naive) way would be to store the structure of your data frame, i.e. its columns and index, separately, and then create a new data frame from your preprocessed results like so:
In [15]: data = np.zeros((2,2))
In [16]: data
Out[16]:
array([[ 0., 0.],
[ 0., 0.]])
In [17]: from pandas import DataFrame
In [21]: df = DataFrame(data, index = ['first', 'second'], columns=['c1','c2'])
In [22]: df
Out[22]:
c1 c2
first00second00In [26]: i = df.index
In [27]: c = df.columns
# generate new data as a numpy arrayIn [29]: df = DataFrame(np.random.rand(2,2), index=i, columns=c)
In [30]: df
Out[30]:
c1 c2
first0.8213540.936703second0.1383760.482180
As you can see in Out[22]
, we start off with a data frame, and then in In[29]
we place some new data inside the frame, leaving the rows and columns unchanged. I am assuming your preprocessing will not
shuffle the rows/ columns of the data.
Post a Comment for "Keep Pandas Structure With Numpy/scikit Functions"