This PR allows a single llama-swap to be the central proxy for models served by other inference servers. The peer servers can be another llama-swap or any API that supports the /v1/* inference endpoint.
Updates: #433, #299Closes: #296
Add configuration support for api keys that are enforced by llama-swap. Keys are stripped before sending them to upstream servers.
Updates: #433, #50 and #251
The new logToStdout option controls what is logged to stdout. The
default has been changed to just the proxy logs, which contain swap and
http request logs.
There are four supported settings: none, proxy, upstream, both. The
"both" setting is the legacy setting where everything was spewed to
stdout.