Skip to content Skip to sidebar Skip to footer

Can't Access Bluemix Object Store From My Notebook

I'm trying to read a couple of JSON files from my Bluemix object store into a Jupyter notebook using Python. I've followed the examples I've found, but I'm still getting a 'No such

Solution 1:

The error message you get IOError: [Errno 2] No such file or directory: 'swift://notebooks.spark/Warehousing-data.json' means that at that path there is no such file. I think the setup of the Hadoop configuration was successful otherwise you would get a different error message complaining about missing credentials settings.

I have tested in a Python notebook on Bluemix the following code and it worked for me. I took the sample code from the latest sample notebooks showing how to load data from Bluemix Object Storage V3.

Method for setting the Hadoop configuration:

defset_hadoop_config(credentials):
    """This function sets the Hadoop configuration with given credentials, 
    so it is possible to access data using SparkContext"""

    prefix = "fs.swift.service." + credentials['name']
    hconf = sc._jsc.hadoopConfiguration()
    hconf.set(prefix + ".auth.url", credentials['auth_url']+'/v3/auth/tokens')
    hconf.set(prefix + ".auth.endpoint.prefix", "endpoints")
    hconf.set(prefix + ".tenant", credentials['project_id'])
    hconf.set(prefix + ".username", credentials['user_id'])
    hconf.set(prefix + ".password", credentials['password'])
    hconf.setInt(prefix + ".http.port", 8080)
    hconf.set(prefix + ".region", credentials['region'])
    hconf.setBoolean(prefix + ".public", True)

Insert credentials for associated Bluemix Object Storave V3:

credentials_1 = {
  'auth_url':'https://identity.open.softlayer.com',
  'project':'***',
  'project_id':'***',
  'region':'dallas',
  'user_id':'***',
  'domain_id':'***',
  'domain_name':'***',
  'username':'***',
  'password':"""***""",
  'filename':'people.json',
  'container':'notebooks',
  'tenantId':'***'
}

Set Hadopp configuration with given credentials:

credentials_1['name'] = 'spark'
set_hadoop_config(credentials_1)

Read JSON file usind sc.textFile() into an RDD and print out first 5 rows:

data_rdd = sc.textFile("swift://" + credentials_1['container'] + "." + credentials_1['name'] + "/" + credentials_1['filename'])
data_rdd.take(3)

Output:

[u'{"name":"Michael"}',
 u'{"name":"Andy", "age":30}',
 u'{"name":"Justin", "age":19}']

Read JSON file using sqlContext.read.json() into a DataFrameand output first 5 rows:

data_df = sqlContext.read.json("swift://" + credentials_1['container'] + "." + credentials_1['name'] + "/" + credentials_1['filename'])
data_df.take(3)

Output:

[Row(age=None, name=u'Michael'),
 Row(age=30, name=u'Andy'),
 Row(age=19, name=u'Justin')]

Solution 2:

I found a better solution at https://developer.ibm.com/recipes/tutorials/using-ibm-object-storage-in-bluemix-with-python/ with sample code at https://github.com/saviosaldanha/IBM_Object_Store_Python_Example/blob/master/storage_recipe_example.py

Here is the revised code:

import swiftclient
from keystoneclient import client

# Object Store credentials (generated by Insert to code)
credentials = {
  'auth_url':'https://identity.open.softlayer.com',
  'project':'***',
  'project_id':'***',
  'region':'dallas',
  'user_id':'***',
  'domain_id':'***',
  'domain_name':'***',
  'username':'***',
  'password':"""***""",
  'filename':'Warehousing-data.json',
  'container':'notebooks',
  'tenantId':'***'
}

# Establish Connection to Bluemix Object Store
connection = swiftclient.Connection(
    key=credentials[password],
    authurl=credentials[auth_url],
    auth_version='3',
    os_options={"project_id": credentials[projectId],
                "user_id": credentials[userId],
                "region_name": credentials[region]})

# The data files should now be accessible through calls of the form# connection.get_object(credentials[container], fileName)[1]

Then the files are accessed as:

Warehousing_data_json = "Warehousing-data.json"Warehousing_sales_data_nominal_scenario_json = "Warehousing-sales_data-nominal_scenario.json"resp = client.execute(input= [{'name': "warehousing.mod",
                               'file': StringIO(warehousing_data_dotmod + warehousing_inputs + warehousing_dotmod + warehousing_outputs)},
                              {'name': Warehousing_data_json,
                               'filename': connection.get_object(credentials[container], Warehousing_data_json)[1]},
                              {'name': Warehousing_sales_data_nominal_scenario_json,
                               'filename': connection.get_object(credentials[container], Warehousing_sales_data_nominal_scenario_json)[1]}],
                              output= "results.json",
                              load_solution= True,
                              log= "solver.log",
                              gzip= True,
                              waittime= 300,
                              delete_on_completion= True)

The problem is how to load the libraries swiftclient and keystoneclient in Bluemix? Pip doesn't seem to work in the notebook. Anyone know how to handle this?

Post a Comment for "Can't Access Bluemix Object Store From My Notebook"