Skip to content Skip to sidebar Skip to footer

Can't Properly Read Sql Table In Python: Varchar Columns Imported As Comma-separated Characters / Tuples

I'm connecting to a Oracle database using the following code: jar = ojdbc8.jar path jvm_path = jvm.dll path args = '-Djava.class.path=%s' % jar jpype.startJVM(jvm_path, args) con =

Solution 1:

This seems to be a problem when using jaydebeapi with jpype. I can reproduce this when connecting to a Oracle db in the same way that you do (in my case Oracle 11gR2, but since you are using ojdbc8.jar, I guess it also happens with other versions).

There are different ways you can solve this:

Change your connection

Since the error only seems to occur in a specific combination of packages, the most sensible thing to do is to try and avoid these and thus the error altogether.

  1. Alternative 1: Use jaydebeapi without jpype:

    As noted, I only observe this when using jaydebeapi with jpype. However, in my case, jpype is not needed at all. I have the .jar file locally and my connection works fine without it:

    import jaydebeapi as jdba
    import pandas as pd
    import os
    
    db_host = 'db.host.com'
    db_port = 1521
    db_sid = 'YOURSID'
    
    jar=os.getcwd()+'/ojdbc6.jar'
    
    conn = jdba.connect('oracle.jdbc.driver.OracleDriver', 
                    'jdbc:oracle:thin:@' + db_host + ':' + str(db_port) + ':' + db_sid, 
                    {'user': 'USERNAME', 'password': 'PASSWORD'}, 
                    jar
                    )
    
    df_jay = pd.read_sql('SELECT * FROM YOURSID.table1', conn)
    
    conn.close()
    

    In my case, this works fine and creates the dataframes normally.

  2. Alternative 2: Use cx_Oracle instead:

    The issue also does not occur if I use cx_Oracle to connect to the Oracle db:

    import cx_Oracle
    import pandas as pd
    import os
    
    db_host = 'db.host.com'
    db_port = 1521
    db_sid = 'YOURSID'
    
    dsn_tns = cx_Oracle.makedsn(db_host, db_port, db_sid)
    cx_conn = cx_Oracle.connect('USERNAME', 'PASSWORD', dsn_tns)
    
    df_cxo = pd.read_sql('SELECT * FROM YOURSID.table1', con=cx_conn)
    
    cx_conn.close()
    

    Note: For cx_Oracle to work you have to have the Oracle Instant Client installed and properly set up (see e.g. cx_Oracle documentation for Ubuntu).

Fix dataframe after the fact:

If for some reason, you cannot use the above connection alternatives, you can also transform your dataframe.

  1. Alternative 3: join tuple entries:

    You can use ''.join() to convert tuples to strings. You need to do this for the entries and the column names.

    # for all entries that are not None, join the tuplesfor col in df.select_dtypes(include=['object']).columns:
        df[col] = df[col].apply(lambda x: ''.join(x) if x isnotNoneelse x)
    
    # also rename the column headings in the same way
    df.rename(columns=lambda x: ''.join(x) if x isnotNoneelse x, inplace=True)
    
  2. Alternative 4: change dtype of columns:

    By changnig the dtype of an affected column from object to string, all entries will also be converted. Note that this may have unwanted side-effects, like e.g. changing None values to the string <N/A>. Also, you will have to rename the column headings separately, as above.

    for col in df.select_dtypes(include=['object']).columns:
        df[col] = df[col].astype('string')
    
    # again, rename headings
    df.rename(columns=lambda x: ''.join(x) if x isnotNoneelse x, inplace=True)
    

All of these should yield more or less the same df in the end (apart from the dtypes and possible replacement of None values):

+---+---------+---------+---------+
|   | COLUMN1 | COLUMN2 | COLUMN3 |
+---+---------+---------+---------+
| 1 | test    | test2   | 1       |
+---+---------+---------+---------+
| 2 | foo     | bar     | 100     |
+---+---------+---------+---------+

Post a Comment for "Can't Properly Read Sql Table In Python: Varchar Columns Imported As Comma-separated Characters / Tuples"