Post by norby777 in Free LLM API resources + API Setup Guide

API Key: Look for Api Keys in the top right -> Click Create API Keys -> Name it and hit Submit. A critical step: "Your new API key has been created. Copy it now, as we will not display it again." Make sure you save it!

Code:

Endpoint URL: https://api.groq.com/openai/v1/chat/completions

Headers:

{

"Content-Type": "application/json",

"Authorization": "Bearer your api key"

}

Body:

{

"messages": [

{

"role": "user",

"content": "Your prompt goes here, for example Translate this text to English and only return the translated text: %text% /no_think"

}

"model": "qwen/qwen3-32b",

"stream": false,

"include_reasoning": false

}

Text Output Path: .choices[0].message.content

To switch models, just change the model's name in the Body. Models: Find the full list on the GitHub page, Official site, or directly in the playground. You can usually disable "reasoning" (the model's thinking process) for speed using flags like /no_think or "include_reasoning": false. For instance, you'd use "include_reasoning": false for qwen/qwen3-32b. Be aware: Some models (like moonshotai/kimi-k2-instruct-0905) are non-reasoning by default, so you might need to remove "include_reasoning": false if it’s there, just to get them working properly.

Limits/Pricing: Check the GitHub page or the official rate limits documentation.

10. Together (Free)

I haven't personally tested this one because it requires adding a credit card and topping up with $5. However, based on the documentation, the setup process should be the same as the others on the list.

The GitHub resource indicates that after the initial $5 payment, you gain access to two free models: https://www.together.ai/models/deepseek-r1-distilled-llama-70b-free and https://www.together.ai/models/llama-3-3-70b-free.

The API structure is very similar for both:

Pl: Url: https://api.together.xyz/v1/chat/completions

Headers: Authorization: Bearer $TOGETHER_API_KEY" \

"Content-Type: application/json" \

Body: {

"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free",

"messages": [

{

"role": "user",

"content": "Your prompt"

}

]

}

Limits/Pricing: Up to 60 requests/minute

11. Cohere

API Key: The key is automatically generated the moment you sign up. You can find it later under the API keys tab in the left-hand menu.

Code:

Endpoint URL: https://api.cohere.ai/v2/chat

Headers:

{

"Content-Type": "application/json",

"Authorization": "Bearer your api key"

}

Body:

{

"model": "command-a-translate-08-2025",

"messages": [

{

"role": "user",

"content": "Your prompt goes here, for example Translate this text to English and only return the translated text: %text%"

}

"stream": false,

"thinking": {

"type": "disabled"

}

Text Output Path: .text

To switch models, just change the model's name in the Body. Models: Check the official documentation or look in the model section of the Playground. You can disable reasoning for models like command-a-reasoning-08-2025 by including the parameter: thinking: {"type": "disabled"}.

Limits/Pricing: The limits are 20 requests per minute and 1,000 requests per month. For more details, see the official docs.

12. Github

Token Key: Go to your profile Settings (top right) -> In the left menu, scroll to the bottom and select Developer settings -> Choose Personal access tokens and then Tokens (classic) -> Click Generate new token (top right) -> Select Generate new token (classic) -> Give it a name and set the Expiration -> Click Generate token at the bottom. Crucial: "Make sure to copy your personal access token now. You won’t be able to see it again!"

Code:

Endpoint URL: https://models.github.ai/inference/chat/completions

Headers:

{

"Content-Type": "application/json",

"Authorization": "Bearer your api key"

}

Body:

{

"model": "meta/Meta-Llama-3.1-8B-Instruct",

"messages": [

{

"role": "user",

"content": "Your prompt goes here, for example Translate this text to English and only return the translated text: %text%"

}

"stream": false

}

Text Output Path: .choices[0].message.content

To switch models, just change the model's name in the Body. Models: You can browse the models on the GitHub Marketplace. To find the exact model ID: click on the model -> Go to the Playground tab at the top -> Click Code -> Look for the model string, e.g.: "model": "meta/Meta-Llama-3.1-8B-Instruct".

Limits/Pricing: Be aware that the input/output token limits are extremely restrictive. The actual limits depend on your Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise). More details can be found in the documentation.

13. Cloudflare

API Key: Click the small person icon (top right) and profile -> In the left-hand menu API Tokens -> Create Token -> Workers AI -> Name it -> Include - All accounts -> Continue to summary -> Create Token -> "Copy this token to access the Cloudflare API. For security this will not be shown again."

CLOUDFLARE_ACCOUNT_ID: Link ->Account home-> Click the three small buttons next to the 'Account' text ->Copy account ID and this should be pasted into the URL.

Code:

Endpoint URL: https://api.cloudflare.com/client/v4/accounts/CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/meta/llama-3.1-8b-instruct

Headers:

{

"Content-Type": "application/json",

"Authorization": "Bearer your api key"

}

Body:

{

"messages": [

{

"role": "user",

"content": "Your prompt goes here, for example Translate this text to English and only return the translated text: %text%"

}

"stream": false

}

Text Output Path: .result.response

To switch models, just change the model's name in the Endpoint URL. Models: Browse the list here. When you pick a model (e.g., llama-3.1-8b-instruct), copy the entire Model ID (e.g., "@cf/meta/llama-3.1-8b-instruct") and insert it into your Endpoint URL right after the /run/ segment. Example URL structure: .../ai/run/@cf/meta/llama-3.1-8b-instruct

Limits/Pricing: Your free allocation is 10,000 neurons per day. You can find more details on their pricing page.

14. Google Cloud Vertex AI

This is part of the Google Cloud, but I wasn't able to get it working, so I skipped it.

Starting now, I'll be sharing a few extra APIs that weren't on the original list. I thought you might find them useful too!

15. Azure AI Translator

Warning: You will need to provide your credit card details to use this service.

Setup Steps: Use the search bar to find Translators -> Click Create -> Enter your details (name, region) and select the F0 free tier -> Hit 'create' again to finish.

API Key: Navigate to the new Translator service you just created -> In the left menu, you'll find everything you need under Keys and Endpoint.

Code:

Endpoint URL: https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&to=YOUR_LANGUAGE_CODE_e.g.:en

Headers:

{

"Content-Type": "application/json",

"Ocp-Apim-Subscription-Key": "YOUR API KEY",

"Ocp-Apim-Subscription-Region": "YOUR REGION (under Keys and Endpoint)"

}

Body:

[

{

"Text": "%text%"

}

]

Text Output Path: .translations[0].text

Limits/Pricing: The F0 Tier gives you a generous 2 million characters per hour. The system automatically enforces this limit, so once you reach 2 million characters within an hour, the service will simply stop working until the next hour begins. Link, Link.

16. Z.AI

API Key: Click your profile (top right) -> Go to API Keys -> Select Create a new API key -> Give it a name and hit Confirm. That's all there is to it.

Based on what I've seen, it looks like there's only one free model available, which is the flash version.

Code:

Endpoint URL: https://api.z.ai/api/paas/v4/chat/completions

Headers:

{

"Content-Type": "application/json",

"Authorization": "Bearer your api key"

}

Body:

{

"model": "glm-4.5-flash",

"messages": [

{

"role": "user",

"content": "Your prompt goes here, for example Translate this text to English and only return the translated text: %text% /nothink"

}

"stream": false,

"extra_body": {

"chat_template_kwargs": {

"enable_thinking": false

}

Text Output Path: .choices[0].message.content

Limits/Pricing: The free model, GLM-4.5-Flash, has a Concurrency limit of 2. "Explanation of Rate Limits: To ensure stable access to GLM-4-Flash during the free trial, requests with context lengths over 8K will be throttled to 1% of the standard concurrency limit."Input Cached Input Cached Input Storage Output GLM-4.5-Flash Free Free Free Free You can find pricing and limit details in their documentation.

itch.io

Viewing post in Free LLM API resources + API Setup Guide