このページは、まだ日本語ではご利用いただけません。翻訳中です。
構成
このプラグインはDBレスモードに対応しています。
互換性のあるプロトコル
AI Proxy Advancedプラグインは以下のプロトコルに対応しています:
grpc
, grpcs
, http
, https
パラメータ
このプラグインの設定で使用できるすべてのパラメータのリストは次のとおりです。
-
name or plugin
string requiredプラグイン名。この場合は
ai-proxy-advanced
。- Kong Admin API、Kong Konnect API、宣言型構成、または decK ファイルを使用する場合、フィールドは
name
です。 - Kubernetes で KongPlugin オブジェクトを使用する場合、フィールドは
plugin
です。
- Kong Admin API、Kong Konnect API、宣言型構成、または decK ファイルを使用する場合、フィールドは
-
instance_name
stringプラグインのインスタンスを識別するための任意のカスタム名 (例:
ai-proxy-advanced_my-service
。インスタンス名はKong ManagerとKonnectに表示されるので、 例えば複数のサービスで同じプラグインを複数のコンテキストで実行する場合に便利です。また、Kong Admin API経由で特定のプラグインインスタンスに アクセスするためにも使用できます。
インスタンス名は、次のコンテキスト内で一意である必要があります。
- Kong Gateway Enterpriseのワークスペース内
- Konnectのコントロールプレーン(CP)またはコントロールプレーン(CP)グループ内
- Kong Gateway (OSS)の全世界
-
service.name or service.id
stringプラグインが対象とするサービス名または ID。最上位の
/plugins
エンドポイント. からプラグインをサービスに追加する場合は、これらのパラメータのいずれかを設定してください/services/{serviceName|Id}/plugins
を使用する場合は必要ありません。 -
route.name or route.id
stringプラグインがターゲットとするルート名または ID。最上位の
/plugins
エンドポイント. を通るルートにプラグインを追加する場合は、これらのパラメータのいずれかを設定してください/routes/{routeName|Id}/plugins
を使用する場合は必要ありません。 -
consumer.name or consumer.id
stringプラグインがターゲットとするコンシューマーの名前または ID。 最上位の
/plugins
エンドポイント. からコンシューマーにプラグインを追加する場合は、これらのパラメーターのいずれかを設定してください/consumers/{consumerName|Id}/plugins
を使用する場合は必要ありません。 -
consumer_group.name or consumer_group.id
stringプラグインが対象とするコンシューマグループの名前または ID。 設定されている場合、プラグインは指定されたグループが認証されているリクエストに対してのみアクティブになります
/plugins
エンドポイント./consumer_groups/{consumerGroupName|Id}/plugins
を使用する場合は必要ありません。 -
enabled
boolean default:true
このプラグインが適用されるかどうか。
-
config
record required-
balancer
record required-
algorithm
string default:round-robin
Must be one of:round-robin
,lowest-latency
,lowest-usage
,consistent-hashing
,semantic
Which load balancing algorithm to use.
-
tokens_count_strategy
string default:total-tokens
Must be one of:total-tokens
,prompt-tokens
,completion-tokens
What tokens to use for usage calculation. Available values are:
total_tokens
prompt_tokens
, andcompletion_tokens
.
-
latency_strategy
string default:tpot
Must be one of:tpot
,e2e
What metrics to use for latency. Available values are:
tpot
(time-per-output-token) ande2e
.
-
hash_on_header
string default:X-Kong-LLM-Request-ID
The header to use for consistent-hashing.
-
slots
integer default:10000
between:10
65536
The number of slots in the load balancer algorithm.
-
retries
integer default:5
between:0
32767
The number of retries to execute upon failure to proxy.
-
connect_timeout
integer default:60000
between:1
2147483646
-
write_timeout
integer default:60000
between:1
2147483646
-
read_timeout
integer default:60000
between:1
2147483646
-
-
embeddings
record-
auth
record-
header_name
string referenceableIf AI model requires authentication via Authorization or API key header, specify its name here.
-
header_value
string referenceable encryptedSpecify the full auth header value for ‘header_name’, for example ‘Bearer key’ or just ‘key’.
-
param_name
string referenceableIf AI model requires authentication via query parameter, specify its name here.
-
param_value
string referenceable encryptedSpecify the full parameter value for ‘param_name’.
-
param_location
string Must be one of:query
,body
Specify whether the ‘param_name’ and ‘param_value’ options go in a query string, or the POST form/JSON body.
-
azure_use_managed_identity
boolean default:false
Set true to use the Azure Cloud Managed Identity (or user-assigned identity) to authenticate with Azure-provider models.
-
azure_client_id
string referenceableIf azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client ID.
-
azure_client_secret
string referenceable encryptedIf azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client secret.
-
azure_tenant_id
string referenceableIf azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the tenant ID.
-
gcp_use_service_account
boolean default:false
Use service account auth for GCP-based providers and models.
-
gcp_service_account_json
string referenceable encryptedSet this field to the full JSON of the GCP service account to authenticate, if required. If null (and gcp_use_service_account is true), Kong will attempt to read from environment variable
GCP_SERVICE_ACCOUNT
.
-
aws_access_key_id
string referenceable encryptedSet this if you are using an AWS provider (Bedrock) and you are authenticating using static IAM User credentials. Setting this will override the AWS_ACCESS_KEY_ID environment variable for this plugin instance.
-
aws_secret_access_key
string referenceable encryptedSet this if you are using an AWS provider (Bedrock) and you are authenticating using static IAM User credentials. Setting this will override the AWS_SECRET_ACCESS_KEY environment variable for this plugin instance.
-
allow_override
boolean default:false
If enabled, the authorization header or parameter can be overridden in the request by the value configured in the plugin.
-
-
model
record required-
provider
string required Must be one of:openai
,mistral
AI provider format to use for embeddings API
-
name
string required Must be one of:text-embedding-3-large
,text-embedding-3-small
,mistral-embed
Model name to execute.
-
options
recordKey/value settings for the model
-
upstream_url
stringupstream url for the embeddings
-
-
-
-
vectordb
record-
strategy
string required Must be one of:redis
which vector database driver to use
-
dimensions
integer requiredthe desired dimensionality for the vectors
-
threshold
number requiredthe default similarity threshold for accepting semantic search results (float)
-
distance_metric
string required Must be one of:cosine
,euclidean
the distance metric to use for vector searches
-
redis
record required-
host
stringA string representing a host name, such as example.com.
-
port
integer between:0
65535
An integer representing a port number between 0 and 65535, inclusive.
-
connect_timeout
integer default:2000
between:0
2147483646
An integer representing a timeout in milliseconds. Must be between 0 and 2^31-2.
-
send_timeout
integer default:2000
between:0
2147483646
An integer representing a timeout in milliseconds. Must be between 0 and 2^31-2.
-
read_timeout
integer default:2000
between:0
2147483646
An integer representing a timeout in milliseconds. Must be between 0 and 2^31-2.
-
username
string referenceableUsername to use for Redis connections. If undefined, ACL authentication won’t be performed. This requires Redis v6.0.0+. To be compatible with Redis v5.x.y, you can set it to
default
.
-
password
string referenceable encryptedPassword to use for Redis connections. If undefined, no AUTH commands are sent to Redis.
-
sentinel_username
string referenceableSentinel username to authenticate with a Redis Sentinel instance. If undefined, ACL authentication won’t be performed. This requires Redis v6.2.0+.
-
sentinel_password
string referenceable encryptedSentinel password to authenticate with a Redis Sentinel instance. If undefined, no AUTH commands are sent to Redis Sentinels.
-
database
integer default:0
Database to use for the Redis connection when using the
redis
strategy
-
keepalive_pool_size
integer default:256
between:1
2147483646
The size limit for every cosocket connection pool associated with every remote server, per worker process. If neither
keepalive_pool_size
norkeepalive_backlog
is specified, no pool is created. Ifkeepalive_pool_size
isn’t specified butkeepalive_backlog
is specified, then the pool uses the default value. Try to increase (e.g. 512) this value if latency is high or throughput is low.
-
keepalive_backlog
integer between:0
2147483646
Limits the total number of opened connections for a pool. If the connection pool is full, connection queues above the limit go into the backlog queue. If the backlog queue is full, subsequent connect operations fail and return
nil
. Queued operations (subject to set timeouts) resume once the number of connections in the pool is less thankeepalive_pool_size
. If latency is high or throughput is low, try increasing this value. Empirically, this value is larger thankeepalive_pool_size
.
-
sentinel_master
stringSentinel master to use for Redis connections. Defining this value implies using Redis Sentinel.
-
sentinel_role
string Must be one of:master
,slave
,any
Sentinel role to use for Redis connections when the
redis
strategy is defined. Defining this value implies using Redis Sentinel.
-
sentinel_nodes
array of typerecord
len_min:1
Sentinel node addresses to use for Redis connections when the
redis
strategy is defined. Defining this field implies using a Redis Sentinel. The minimum length of the array is 1 element.-
host
string required default:127.0.0.1
A string representing a host name, such as example.com.
-
port
integer default:6379
between:0
65535
An integer representing a port number between 0 and 65535, inclusive.
-
-
cluster_nodes
array of typerecord
len_min:1
Cluster addresses to use for Redis connections when the
redis
strategy is defined. Defining this field implies using a Redis Cluster. The minimum length of the array is 1 element.-
ip
string required default:127.0.0.1
A string representing a host name, such as example.com.
-
port
integer default:6379
between:0
65535
An integer representing a port number between 0 and 65535, inclusive.
-
-
ssl
boolean default:false
If set to true, uses SSL to connect to Redis.
-
ssl_verify
boolean default:false
If set to true, verifies the validity of the server SSL certificate. If setting this parameter, also configure
lua_ssl_trusted_certificate
inkong.conf
to specify the CA (or server) certificate used by your Redis server. You may also need to configurelua_ssl_verify_depth
accordingly.
-
server_name
stringA string representing an SNI (server name indication) value for TLS.
-
cluster_max_redirections
integer default:5
Maximum retry attempts for redirection.
-
connection_is_proxied
boolean default:false
If the connection to Redis is proxied (e.g. Envoy), set it
true
. Set thehost
andport
to point to the proxy address.
-
-
-
max_request_body_size
integer default:8192
max allowed body size allowed to be introspected
-
model_name_header
boolean default:true
Display the model name selected in the X-Kong-LLM-Model response header
-
targets
array of typerecord
required-
route_type
string required Must be one of:llm/v1/chat
,llm/v1/completions
,preserve
The model’s operation implementation, for this provider. Set to
preserve
to pass through without transformation.
-
auth
record-
header_name
string referenceableIf AI model requires authentication via Authorization or API key header, specify its name here.
-
header_value
string referenceable encryptedSpecify the full auth header value for ‘header_name’, for example ‘Bearer key’ or just ‘key’.
-
param_name
string referenceableIf AI model requires authentication via query parameter, specify its name here.
-
param_value
string referenceable encryptedSpecify the full parameter value for ‘param_name’.
-
param_location
string Must be one of:query
,body
Specify whether the ‘param_name’ and ‘param_value’ options go in a query string, or the POST form/JSON body.
-
azure_use_managed_identity
boolean default:false
Set true to use the Azure Cloud Managed Identity (or user-assigned identity) to authenticate with Azure-provider models.
-
azure_client_id
string referenceableIf azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client ID.
-
azure_client_secret
string referenceable encryptedIf azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client secret.
-
azure_tenant_id
string referenceableIf azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the tenant ID.
-
gcp_use_service_account
boolean default:false
Use service account auth for GCP-based providers and models.
-
gcp_service_account_json
string referenceable encryptedSet this field to the full JSON of the GCP service account to authenticate, if required. If null (and gcp_use_service_account is true), Kong will attempt to read from environment variable
GCP_SERVICE_ACCOUNT
.
-
aws_access_key_id
string referenceable encryptedSet this if you are using an AWS provider (Bedrock) and you are authenticating using static IAM User credentials. Setting this will override the AWS_ACCESS_KEY_ID environment variable for this plugin instance.
-
aws_secret_access_key
string referenceable encryptedSet this if you are using an AWS provider (Bedrock) and you are authenticating using static IAM User credentials. Setting this will override the AWS_SECRET_ACCESS_KEY environment variable for this plugin instance.
-
allow_override
boolean default:false
If enabled, the authorization header or parameter can be overridden in the request by the value configured in the plugin.
-
-
model
record required-
provider
string required Must be one of:openai
,azure
,anthropic
,cohere
,mistral
,llama2
,gemini
,bedrock
AI provider request format - Kong translates requests to and from the specified backend compatible formats.
-
name
stringModel name to execute.
-
options
recordKey/value settings for the model
-
max_tokens
integer default:256
Defines the max_tokens, if using chat or completion models.
-
input_cost
numberDefines the cost per 1M tokens in your prompt.
-
output_cost
numberDefines the cost per 1M tokens in the output of the AI.
-
temperature
number between:0
5
Defines the matching temperature, if using chat or completion models.
-
top_p
number between:0
1
Defines the top-p probability mass, if supported.
-
top_k
integer between:0
500
Defines the top-k most likely tokens, if supported.
-
anthropic_version
stringDefines the schema/API version, if using Anthropic provider.
-
azure_instance
stringInstance name for Azure OpenAI hosted models.
-
azure_api_version
string default:2023-05-15
‘api-version’ for Azure OpenAI instances.
-
azure_deployment_id
stringDeployment ID for Azure OpenAI instances.
-
llama2_format
string Must be one of:raw
,openai
,ollama
If using llama2 provider, select the upstream message format.
-
mistral_format
string Must be one of:openai
,ollama
If using mistral provider, select the upstream message format.
-
upstream_url
stringManually specify or override the full URL to the AI operation endpoints, when calling (self-)hosted models, or for running via a private endpoint.
-
upstream_path
stringManually specify or override the AI operation path, used when e.g. using the ‘preserve’ route_type.
-
gemini
record-
api_endpoint
stringIf running Gemini on Vertex, specify the regional API endpoint (hostname only).
-
project_id
stringIf running Gemini on Vertex, specify the project ID.
-
location_id
stringIf running Gemini on Vertex, specify the location ID.
-
-
bedrock
record-
aws_region
stringIf using AWS providers (Bedrock) you can override the
AWS_REGION
environment variable by setting this option.
-
-
-
-
weight
integer default:100
between:1
65535
The weight this target gets within the upstream loadbalancer (1-65535).
-
description
stringThe semantic description of the target, required if using semantic load balancing.
-
logging
record required-
log_statistics
boolean required default:false
If enabled and supported by the driver, will add model usage and token metrics into the Kong log plugin(s) output.
-
log_payloads
boolean required default:false
If enabled, will log the request and response body into the Kong log plugin(s) output.
-
-
-