Skip to content Skip to sidebar Skip to footer

Is It Possible To Directly Rename Pandas Dataframe's Columns Stored In Hdf5 File?

I have a very large pandas dataframe stored in hdf5 file, and I need to rename the columns of the dataframe. The straightforward way is to read the dataframe in chunks using HDFSto

Solution 1:

It can be done by changing the meta-data. BIG WARNING. This may corrupt your file, so you are at your own risk.

Create a store. Must be a table format. I didn't use data_columns here, but the change is only slight to rename those.

In [1]: df = DataFrame(np.random.randn(10,3),columns=list('abc'))

In [2]: df.to_hdf('test.h5','df',format='table')
In [24]: df.to_hdf('test.h5','df',format='table')

In [25]: pd.read_hdf('test.h5','df')
Out[25]: 
          a         b         c
01.3662980.844646 -0.4707351 -1.438387 -1.2884320.2507632 -1.290225 -0.390315 -0.13844032.3430190.632340 -0.5393344 -1.1849430.5664791.9779395 -1.5307720.757110 -0.0139306 -0.300345 -0.951563 -1.0139577 -0.073975 -0.2565211.0245258 -0.179189 -1.7679180.59172090.6410280.2055221.947618

Get a handle to the table itself

In [26]: store = pd.HDFStore('test.h5')

You need to change meta-data in 2 places. First here at the top-level

In [28]: store.get_storer('df').attrs['non_index_axes']
Out[28]: [(1, ['a', 'b', 'c'])]

In [29]: store.get_storer('df').attrs.non_index_axes = [(1, ['new','b','c'])]

Then here

In [31]: store.get_storer('df').table.attrs
Out[31]: 
/df/table._v_attrs (AttributeSet), 12 attributes:
   [CLASS := 'TABLE',
    FIELD_0_FILL := 0,
    FIELD_0_NAME := 'index',
    FIELD_1_FILL := 0.0,
    FIELD_1_NAME := 'values_block_0',
    NROWS := 10,
    TITLE := '',
    VERSION := '2.7',
    index_kind := 'integer',
    values_block_0_dtype := 'float64',
    values_block_0_kind := ['a', 'b', 'c'],
    values_block_0_meta := None]

In [33]: store.get_storer('df').table.attrs.values_block_0_kind = ['new','b','c']

Close the store to save

In[34]: store.close()

In[35]: pd.read_hdf('test.h5','df')
Out[35]: 
        newbc01.3662980.844646-0.4707351-1.438387-1.2884320.2507632-1.290225-0.390315-0.13844032.3430190.632340-0.5393344-1.1849430.5664791.9779395-1.5307720.757110-0.0139306-0.300345-0.951563-1.0139577-0.073975-0.2565211.0245258-0.179189-1.7679180.59172090.6410280.2055221.947618

Post a Comment for "Is It Possible To Directly Rename Pandas Dataframe's Columns Stored In Hdf5 File?"