Set up AI Proxy Advanced with Llama2 - Plugin

This guide walks you through setting up the AI Proxy Advanced plugin with the Llama2 LLM.

Llama2 is a self-hosted model. As such, it requires setting the model option upstream_url to point to the absolute HTTP(S) endpoint for this model implementation.

There are a number of hosting/format options for running this LLM. Popular options include:

Upstream formats

The upstream request and response formats are different between various implementations of Llama2, and its accompanying web server.

For this provider, the following should be used for the config.model.options.llama2_format parameter:

Llama2 Hosting	llama2_format Config Value	Auth Header
HuggingFace	`raw`	`Authorization`
OLLAMA	`ollama`	Not required by default
llama.cpp	`raw`	Not required by default
Self-Hosted GGUF	`openai`	Not required by default

Raw format

The raw format option emits the full Llama2 prompt format, under the JSON field inputs:

{
  "inputs": "<s>[INST] <<SYS>>You are a mathematician. \n <</SYS>> \n\n What is 1 + 1? [/INST]"
}

It expects the response to be in the responses JSON field. If using llama.cpp, it should also be set to RAW mode.

Ollama format

The ollama format option adheres to the chat and chat-completion request formats, as defined in its API documentation.

OpenAI format

The openai format option follows the same upstream formats as the equivalent OpenAI route type operation (that is, llm/v1/chat or llm/v1/completions).

Using the plugin with Llama2

For all providers, the Kong AI Proxy Advanced plugin attaches to route entities.

It can be installed into one route per operation, for example:

OpenAI chat route
Cohere chat route
Cohere completions route

Each of these AI-enabled routes must point to a null service. This service doesn’t need to map to any real upstream URL, it can point somewhere empty (for example, http://localhost:32000), because the plugin overwrites the upstream URL. This requirement will be removed in a later Kong revision.

Prerequisites

You need a service to contain the route for the LLM provider. Create a service first:

  curl -X POST http://localhost:8001/services \
    --data "name=ai-proxy-advanced" \
    --data "url=http://localhost:32000"

Remember that the upstream URL can point anywhere empty, as it won’t be used by the plugin.

Provider configuration

Set up route and plugin

After installing and starting your Llama2 instance, you can then create an AI Proxy Advanced route and plugin configuration.

Create the route:

curl -X POST http://localhost:8001/services/ai-proxy-advanced/routes \
  --data "name=llama2-chat" \
  --data "paths[]=~/llama2-chat$"

Enable and configure the AI Proxy Advanced plugin for Llama2:

routeで有効にする

Kong Admin API

Konnect API

Kubernetes

Declarative (YAML)

Konnect Terraform

次のリクエストを行います。

curl -X POST http://localhost:8001/routes/{routeName|Id}/plugins \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
  "name": "ai-proxy-advanced",
  "config": {
    "targets": [
      {
        "route_type": "llm/v1/chat",
        "model": {
          "provider": "llama2",
          "name": "llama2",
          "options": {
            "llama2_format": "ollama",
            "upstream_url": "http://ollama-server.local:11434/api/chat"
          }
        }
      }
    ]
  }
}
    '

ROUTE_NAME

IDを、このプラグイン構成が対象とするルートのid またはnameに置き換えてください。

独自のアクセストークン、リージョン、コントロールプレーン（CP）ID、ルートIDを代入して、次のリクエストをしてください。

curl -X POST \
https://{us|eu}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/routes/{routeId}/plugins \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer TOKEN" \
    --data '{"name":"ai-proxy-advanced","config":{"targets":[{"route_type":"llm/v1/chat","model":{"provider":"llama2","name":"llama2","options":{"llama2_format":"ollama","upstream_url":"http://ollama-server.local:11434/api/chat"}}}]}}'

地域固有のURLと個人アクセストークンの詳細については、 Konnect API referenceをご参照ください。

まず、KongPlugin リソースを作成します：

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-proxy-advanced-example
plugin: ai-proxy-advanced
config:
  targets:
  - route_type: llm/v1/chat
    model:
      provider: llama2
      name: llama2
      options:
        llama2_format: ollama
        upstream_url: http://ollama-server.local:11434/api/chat
" | kubectl apply -f -

次に、次のようにingressに注釈を付けて、KongPluginリソースをイングレスに適用します。

kubectl annotate ingress INGRESS_NAME konghq.com/plugins=ai-proxy-advanced-example

INGRESS_NAMEを、このプラグイン構成がターゲットとするイングレス名に置き換えます。 kubectl get ingressを実行すると、利用可能なイングレスを確認できます。

注： KongPluginリソースは一度だけ定義するだけで、ネームスペース内の任意のサービス、コンシューマー、またはルートに適用できます。プラグインをクラスター全体で利用可能にしたい場合は、KongPluginの代わりにKongClusterPluginとしてリソースを作成してください。

このセクションを宣言型構成ファイルに追加します。

plugins:
- name: ai-proxy-advanced
  route: ROUTE_NAME|ID
  config:
    targets:
    - route_type: llm/v1/chat
      model:
        provider: llama2
        name: llama2
        options:
          llama2_format: ollama
          upstream_url: http://ollama-server.local:11434/api/chat

ROUTE_NAME

IDを、このプラグイン構成が対象とするルートのid またはnameに置き換えてください。

前提条件： パーソナルアクセストークンの設定

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "kpat_YOUR_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Kong Konnectゲートウェイプラグインを作成するには、Terraform 構成に以下を追加します。

resource "konnect_gateway_plugin_ai_proxy_advanced" "my_ai_proxy_advanced" {
  enabled = true

  config = {
    targets = [
      {
        route_type = "llm/v1/chat"

        model = {
          provider = "llama2"
          name = "llama2"
          options = {
            llama2_format = "ollama"
            upstream_url = "http://ollama-server.local:11434/api/chat"
          }
        }
      }    ]
  }

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  route = {
    id = konnect_gateway_route.my_route.id
  }
}

Test the configuration

Make an llm/v1/chat type request to test your new endpoint:

curl -X POST http://localhost:8000/llama2-chat \
  -H 'Content-Type: application/json' \
  --data-raw '{ "messages": [ { "role": "system", "content": "You are a mathematician" }, { "role": "user", "content": "What is 1+1?"} ] }'

前へ Set up AI Proxy Advanced with Hugging Face

次へ Set up AI Proxy Advanced with Mistral