Data Science in OCI

April 24, 2024 | seedling, permanent

tags :

Data Science in OCI #

oracle ref ref github Data Science is a Fully Managed and serverless platform for data science teams to build, train, and manage machine learning models.

The Data Science service #

Provides data scientists with a collaborative, project-driven workspace.
Enables self-service, Serverless access to infrastructure for data science workloads.
Includes Python-centric tools, libraries, and packages developed by the Open Source community and the Oracle Accelerated Data Science Library, which supports the end-to-end lifecycle of predictive models:
Data acquisition, profiling, preparation, and visualization.
Feature engineering.
Model training (including Oracle AutoML).
Model evaluation, explanation, and interpretation (including Oracle MLX).
Integrates with the rest of the Oracle Cloud Infrastructure stack, including Functions, Data Flow, Autonomous Data Warehouse, and Object Storage(Object Storage in OCI).
Model deployment as resources to deploy models as web applications (HTTP API endpoints).
Data Science jobs enable you to define and run repeatable machine learning tasks on a fully-managed infrastructure.
Pipelines enable you to execute end-to-end Machine Learning workflows.
Includes policies, and vaults to control access to compartments and resources.
Includes metrics that provide insight into the health, availability, performance, and utilization of your Data Science resources.
Helps data scientists concentrate on methodology and domain expertise to deliver models to production.

Setting up to use the Notebooks #

ref oracle support docs

Fast setup: add this inside the root compartment policy

Groups can be created to give access to the data scientist to the platform. Details of this setup are in the ref above.

Deployment of LangChain #

ref

import os
from langchain.llms import Cohere
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

os.environ["COHERE_API_KEY"] = ""

llm = Cohere()
prompt = PromptTemplate.from_template("Tell me a joke about subject")
llm_chain = LLMChain(prompt=prompt, llm=llm, verbose=True)


from ads.llm.deploy import ChainDeployment
ChainDeployment(chain=llm_chain).prepare_save_deploy(
    inference_conda_env="pytorch21_p39_gpu_v1",
    deployment_display_name="LangChain Deployment",
    environment_variables="COHERE_API_KEY":""
)

oci raw-request --http-method POST --target-uri /predict --request-body ‘“subject”: “animals”’ --auth resource_principal

LangChain in DataScience notebooks #

ref

Add the API keys to authenticate the calls #

~/.oci/config

[DEFAULT]
user=ocid1.user.oc1..aaaaaaaaya3c7ken4btxftmte342m3gnypmilth2ncpq3yypklkrzjimcmka
fingerprint=a6:5a:8c:e5:a1:20:53:94:fb:66:64:b2:31:86:0f:1e
tenancy=ocid1.tenancy.oc1..aaaaaaaazmhsjjrmpkwxxmyctoaoha4klg3jrlqrfc2cofhrtreocrrqie3a
region=me-jeddah-1
key_file=<path to your private keyfile> # TODO

~/.oci/private-key.pem

—–BEGIN PRIVATE KEY—–
MIIEvwIBADANBgkqhkiG9w0BAQEFAASCBKkwggSlAgEAAoIBAQDedl9plB3pVMgj
F3ySVjbQk9Db5iz9uKvWIY8i0ciO3+qZHX3OVQpWpDKeOyGNHXMGWU4owH+x+OH5
GGCcnDf+MU0E3SE4RFLrIPi8Or6tqMbaIC6smm0gYTfVhhskLs1Fxr4Lb+YEjjnM
/bWYuue8tvK4M/L4sFGnFIYlJw==
—–END PRIVATE KEY—–

after adding these two files, create a new notebook and try accessing the Embeddings or lanchain module

Sample code #

# https://blogs.oracle.com/ai-and-datascience/post/developing-ai-apps-oci-generative-ai-langchain
# model_ids
# cohere.command
# cohere.command-light
# meta.llama-2-70b-chat
# cohere.embed-english-v3.0
# cohere.embed-english-light-v3.0
# cohere.embed-multilingual-v3.0
# cohere.embed-multilingual-light-v3.0

llm = OCIGenAI(
    # model_id="cohere.command",
    model_id="meta.llama-2-70b-chat",
    service_endpoint=endpoint,
    compartment_id=compartment_id,
)

response = llm.invoke("Tell me one fact about earth", temperature=0.7)
print(response)

Data Science in OCI