9. Groq
API Key: Look for Api Keys in the top right -> Click Create API Keys -> Name it and hit Submit. A critical step: "Your new API key has been created. Copy it now, as we will not display it again." Make sure you save it!
Code:
Endpoint URL: https://api.groq.com/openai/v1/chat/completions
Headers:
{
"Content-Type": "application/json",
"Authorization": "Bearer your api key"
}
Body:
{
"messages": [
{
"role": "user",
"content": "Your prompt goes here, for example Translate this text to English and only return the translated text: %text% /no_think"
}
],
"model": "qwen/qwen3-32b",
"stream": false,
"include_reasoning": false
}
Text Output Path: .choices[0].message.content
To switch models, just change the model's name in the Body. Models: Find the full list on the GitHub page, Official site, or directly in the playground. You can usually disable "reasoning" (the model's thinking process) for speed using flags like /no_think or "include_reasoning": false. For instance, you'd use "include_reasoning": false for qwen/qwen3-32b. Be aware: Some models (like moonshotai/kimi-k2-instruct-0905) are non-reasoning by default, so you might need to remove "include_reasoning": false if it’s there, just to get them working properly.
Limits/Pricing: Check the GitHub page or the official rate limits documentation.
10. Together (Free)
I haven't personally tested this one because it requires adding a credit card and topping up with $5. However, based on the documentation, the setup process should be the same as the others on the list.
The GitHub resource indicates that after the initial $5 payment, you gain access to two free models: https://www.together.ai/models/deepseek-r1-distilled-llama-70b-free and https://www.together.ai/models/llama-3-3-70b-free.
The API structure is very similar for both:
Pl: Url: https://api.together.xyz/v1/chat/completions
Headers: Authorization: Bearer $TOGETHER_API_KEY" \
"Content-Type: application/json" \
Body: {
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free",
"messages": [
{
"role": "user",
"content": "Your prompt"
}
]
}
Limits/Pricing: Up to 60 requests/minute
11. Cohere
API Key: The key is automatically generated the moment you sign up. You can find it later under the API keys tab in the left-hand menu.
Code:
Endpoint URL: https://api.cohere.ai/v2/chat
Headers:
{
"Content-Type": "application/json",
"Authorization": "Bearer your api key"
}
Body:
{
"model": "command-a-translate-08-2025",
"messages": [
{
"role": "user",
"content": "Your prompt goes here, for example Translate this text to English and only return the translated text: %text%"
}
],
"stream": false,
"thinking": {
"type": "disabled"
}
}
Text Output Path: .text
To switch models, just change the model's name in the Body. Models: Check the official documentation or look in the model section of the Playground. You can disable reasoning for models like command-a-reasoning-08-2025 by including the parameter: thinking: {"type": "disabled"}.
Limits/Pricing: The limits are 20 requests per minute and 1,000 requests per month. For more details, see the official docs.
12. Github
Token Key: Go to your profile Settings (top right) -> In the left menu, scroll to the bottom and select Developer settings -> Choose Personal access tokens and then Tokens (classic) -> Click Generate new token (top right) -> Select Generate new token (classic) -> Give it a name and set the Expiration -> Click Generate token at the bottom. Crucial: "Make sure to copy your personal access token now. You won’t be able to see it again!"
Code:
Endpoint URL: https://models.github.ai/inference/chat/completions
Headers:
{
"Content-Type": "application/json",
"Authorization": "Bearer your api key"
}
Body:
{
"model": "meta/Meta-Llama-3.1-8B-Instruct",
"messages": [
{
"role": "user",
"content": "Your prompt goes here, for example Translate this text to English and only return the translated text: %text%"
}
],
"stream": false
}
Text Output Path: .choices[0].message.content
To switch models, just change the model's name in the Body. Models: You can browse the models on the GitHub Marketplace. To find the exact model ID: click on the model -> Go to the Playground tab at the top -> Click Code -> Look for the model string, e.g.: "model": "meta/Meta-Llama-3.1-8B-Instruct".
Limits/Pricing: Be aware that the input/output token limits are extremely restrictive. The actual limits depend on your Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise). More details can be found in the documentation.
13. Cloudflare
API Key: Click the small person icon (top right) and profile -> In the left-hand menu API Tokens -> Create Token -> Workers AI -> Name it -> Include - All accounts -> Continue to summary -> Create Token -> "Copy this token to access the Cloudflare API. For security this will not be shown again."
CLOUDFLARE_ACCOUNT_ID: Link ->Account home-> Click the three small buttons next to the 'Account' text ->Copy account ID and this should be pasted into the URL.
Code:
Endpoint URL: https://api.cloudflare.com/client/v4/accounts/CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/meta/llama-3.1-8b-instruct
Headers:
{
"Content-Type": "application/json",
"Authorization": "Bearer your api key"
}
Body:
{
"messages": [
{
"role": "user",
"content": "Your prompt goes here, for example Translate this text to English and only return the translated text: %text%"
}
],
"stream": false
}
Text Output Path: .result.response
To switch models, just change the model's name in the Endpoint URL. Models: Browse the list here. When you pick a model (e.g., llama-3.1-8b-instruct), copy the entire Model ID (e.g., "@cf/meta/llama-3.1-8b-instruct") and insert it into your Endpoint URL right after the /run/ segment. Example URL structure: .../ai/run/@cf/meta/llama-3.1-8b-instruct
Limits/Pricing: Your free allocation is 10,000 neurons per day. You can find more details on their pricing page.
This is part of the Google Cloud, but I wasn't able to get it working, so I skipped it.
Starting now, I'll be sharing a few extra APIs that weren't on the original list. I thought you might find them useful too!
Warning: You will need to provide your credit card details to use this service.
Setup Steps: Use the search bar to find Translators -> Click Create -> Enter your details (name, region) and select the F0 free tier -> Hit 'create' again to finish.
API Key: Navigate to the new Translator service you just created -> In the left menu, you'll find everything you need under Keys and Endpoint.
Code:
Endpoint URL: https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&to=YOUR_LANGUAGE_CODE_e.g.:en
Headers:
{
"Content-Type": "application/json",
"Ocp-Apim-Subscription-Key": "YOUR API KEY",
"Ocp-Apim-Subscription-Region": "YOUR REGION (under Keys and Endpoint)"
}
Body:
[
{
"Text": "%text%"
}
]
Text Output Path: .translations[0].text
Limits/Pricing: The F0 Tier gives you a generous 2 million characters per hour. The system automatically enforces this limit, so once you reach 2 million characters within an hour, the service will simply stop working until the next hour begins. Link, Link.
16. Z.AI
API Key: Click your profile (top right) -> Go to API Keys -> Select Create a new API key -> Give it a name and hit Confirm. That's all there is to it.
Based on what I've seen, it looks like there's only one free model available, which is the flash version.
Code:
Endpoint URL: https://api.z.ai/api/paas/v4/chat/completions
Headers:
{
"Content-Type": "application/json",
"Authorization": "Bearer your api key"
}
Body:
{
"model": "glm-4.5-flash",
"messages": [
{
"role": "user",
"content": "Your prompt goes here, for example Translate this text to English and only return the translated text: %text% /nothink"
}
],
"stream": false,
"extra_body": {
"chat_template_kwargs": {
"enable_thinking": false
}
}
}
Text Output Path: .choices[0].message.content
Limits/Pricing: The free model, GLM-4.5-Flash, has a Concurrency limit of 2. "Explanation of Rate Limits: To ensure stable access to GLM-4-Flash during the free trial, requests with context lengths over 8K will be throttled to 1% of the standard concurrency limit."Input Cached Input Cached Input Storage Output GLM-4.5-Flash Free Free Free Free You can find pricing and limit details in their documentation.