Chat
Get chat completions
Creates a chat completion for the provided prompt, taking into account the content you have trained your project on.
Request body
Key | Type | Description |
---|---|---|
messages | ChatCompletionMessageParam[] | A list of user and assistant messages to which the AI should respond. |
projectKey | string | The project key associated to your project. If shared publicly, use the production key from an allowlisted domain, which is set in the project settings. If not, for instance on localhost, use the development key. |
assistantId | string | The assistant ID. |
assistantVersionId | string | The assistant version ID. If not provided, the default version of the assistant will be used. |
model | 'gpt-4' | 'gpt-4o' | 'gpt-4-turbo-preview' | 'gpt-3.5-turbo' | The completions model to use. |
systemPrompt | string | System prompt with custom instructions on how the AI should behave when generating a response. |
context | object | Values to provide to the system prompt when used with templates. |
outputFormat | 'markdown' | 'slack' | 'html' | Output format, e.g. Slack-flavored Markdown. Default: |
jsonOutput | boolean | If true, return a JSON object instead of text. Default: |
useAllSourcesForRetrieval | boolean | If true, use all connected sources for retrieval. When sources are added, they will automatically be included. |
retrievalSourceIds | string[] | A list of source IDs to use specifically for retrieval. |
stream | boolean | If true, return the response as a Readable Stream. Otherwise, return as a plain JSON object. Default: |
doNotInjectContext | boolean | If true, do not inject the context in the full prompt unless the context tag is present in the template. Default: |
threadId | string | If provided, the messages will be tracked as part of the same thread in the insights. |
temperature | number | The model temperature. Default: |
topP | number | The model top P. Default: |
frequencyPenalty | number | The model frequency penalty. Default: |
presencePenalty | number | The model presence penalty. Default: |
maxTokens | number | The max number of tokens to include in the response. |
maxContextTokens | number | The max number of tokens to include as part of the context. Note that this value will automatically be adjusted to fit within the context window allowed by the model. Default: |
sectionsMatchCount | number | The number of sections to include in the prompt context. |
sectionsMatchThreshold | number | The similarity threshold between the input question and selected sections. The higher the threshold, the more relevant the sections. If it's too high, it can potentially miss some sections. |
tools | ChatCompletionTool[] | A list of tools the model may call. Currently, only functions are supported. Use this to provide a list of functions the model may generate JSON inputs for. |
toolChoice | ChatCompletionToolChoiceOption | Controls which (if any) function is called by the model. "none" means the model will not call a function and instead generates a message. "auto" means the model can pick between generating a message or calling a function. Specifying a particular function via |
excludeFromInsights | boolean | If true, exclude thread from insights. No message data will be stored. Default: |
redact | boolean | If true, redact sensitive data from messages. Default: |
debug | boolean | If true, the response will contain additional metadata for debug purposes. |
Example request
Response
By default, the response is returned as a ReadableStream of the form:
In addition to the stream, the response includes a header named x-markprompt-data
, which is an encoded (Uint8Array) JSON object of the form:
It consists of the following:
- The references (see below) used to generate the answer.
- A
messageId
, which is a unique ID representing the response message. It can be used to subsequently attach metadata to the message, such as a CSAT score, via the/messages
API. - A
threadId
, which is a unique ID which can be passed to subsequent requests and represents a multi-message thread. It can be used to subsequently attach metadata to the thread, such as user account info, via the/threads
API. - If the
debug
parameter is set to true, adebugInfo
object containing information about the query, such as the full prompt that was built for the query.
The reference object
A reference is an object of the form:
and is meant to provide enough information for the client to be able to generate descriptive links to cited sources, including section slugs.
Parsing the header
Here is some example code in JavaScript to decode the references from the response header:
If the stream
flag is set to false, the response is returned as a plain JSON object with a text
field containing the completion, and a references
field containing the list of references used to create the completion:
where references are objects of the form described above.
When querying chat completions, do not use the bearer token if the code is exposed publicly, for instance on a public website. Instead, use the project production key, and make the request from a whitelisted domain. Obtaining the project production key and whitelisting the domain is done in the project settings.
Here is a working example of how to consume the stream in JavaScript. Note the use of projectKey
and no authorization header: this code can be shared publicly, and will work from a domain you have whitelisted in the project settings.
Using templates
When using assistants, Handlebar templates are supported. This can be used to customize instructions, for instance based on specific user-information. Here is an example:
Now we can call the API providing the variables to replace via the context
property:
This will transform the template into the following fully-specified instructions:
To which Markprompt may reply: