Claude API Tutorial for Beginners: Complete Getting Started Guide

2026-06-15 · FreeClaude

TL;DR: The Claude API lets you integrate Anthropic's AI models into your own applications. This guide takes you from zero to working code — covering authentication, your first API call, streaming responses, managing conversations, tool use, and production best practices — with copy-paste Python and JavaScript examples throughout.

什么是Claude API？

The Claude API is Anthropic's programmatic interface for integrating Claude's language models directly into your applications, scripts, workflows, and products. Instead of using the Claude.ai web interface, the API gives you full programmatic control — you send text in, you get text (or structured data) back, and you decide exactly how your application uses it.

The API powers everything from simple chatbots to sophisticated multi-agent systems. Developers use it to build customer service automation, document analysis pipelines, code generation tools, content moderation systems, data extraction workflows, and much more. Any task Claude can do in a browser, it can do through the API — embedded inside your own product.

As of 2026, the API provides access to three main model families: Claude Opus 4.7 (most capable, 1M token context, ideal for complex reasoning), Claude Sonnet 4.6 (balanced performance and speed, best for most production workloads), and Claude Haiku 4.5 (fastest and cheapest, perfect for high-volume, latency-sensitive tasks). Each model is available via a single unified API endpoint with a consistent request/response format.

Pricing is token-based — roughly 750 words equals about 1,000 tokens. Input tokens (what you send) and output tokens (what Claude generates) are priced separately, with input being cheaper. A typical API call might use 500 input tokens and generate 300 output tokens, costing fractions of a cent. The Claude API is significantly more cost-effective for production workloads than per-seat subscription pricing at scale.

If you want to experiment with Claude's capabilities before committing to API costs, FreeClaude provides free access to Claude Max x20 — the same underlying model intelligence — through a referral program requiring no credit card.

搭建开发环境

Before writing any code, you need three things: an Anthropic account, an API key, and the SDK installed in your project.

Step 1: Create an Anthropic Account

Visit console.anthropic.com and sign up. New accounts receive a small credit balance for initial testing — typically enough for hundreds of test calls. Once credits are consumed, add a payment method. API pricing is pay-as-you-go with no minimum commitment.

Step 2: Generate an API Key

In the Anthropic Console, navigate to Settings → API Keys → Create Key. Give it a descriptive name (e.g., "dev-local"). Copy the key immediately — it is shown only once. Store it in a password manager or secrets manager like AWS Secrets Manager. Never hardcode an API key in source code.

Step 3: Install the SDK

Anthropic provides official SDKs for Python and TypeScript/JavaScript. Both are actively maintained and kept in sync with new model releases.

# Python
pip install anthropic

# Node.js / TypeScript
npm install @anthropic-ai/sdk

Step 4: Set Your API Key as an Environment Variable

The SDK automatically reads the ANTHROPIC_API_KEY environment variable:

export ANTHROPIC_API_KEY="sk-ant-..."

For projects using a .env file, install python-dotenv (Python) or dotenv (Node) and load it at startup. Add .env to .gitignore immediately — never commit credentials to version control.

发起你的第一个API请求

With your environment set up, here is the minimal code to make a working API call — the "Hello World" of Claude API development.

Python

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain what an API is in two sentences."}
    ]
)

print(message.content[0].text)

JavaScript / Node.js

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const message = await client.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Explain what an API is in two sentences.' }
  ]
});

console.log(message.content[0].text);

Breaking down the key parameters: model specifies which Claude model to use. max_tokens caps response length — too low truncates responses, too high wastes nothing unless Claude uses those tokens. messages is an array of conversation turns, each with a role (user or assistant) and content.

The response object contains an array of content blocks. For standard text responses, the text is at message.content[0].text. The response also includes usage data: message.usage.input_tokens and message.usage.output_tokens — useful for monitoring costs from day one.

管理多轮对话

The Claude API is stateless — it does not store conversation history on the server. Your application must track the conversation and send the full history with each request.

import anthropic

client = anthropic.Anthropic()
conversation = []

def chat(user_message):
    conversation.append({"role": "user", "content": user_message})
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=conversation
    )
    assistant_message = response.content[0].text
    conversation.append({"role": "assistant", "content": assistant_message})
    return assistant_message

print(chat("My name is Alex and I'm learning Python."))
print(chat("What's a good first project for someone like me?"))
print(chat("How long will that take to build?"))

The conversation list grows with each turn, and the full list is sent with every request. Claude receives the complete context and can reference anything said earlier. For very long sessions, implement summarization: periodically ask Claude to summarize the conversation, then replace history with that summary to stay within context window limits.

实时流式响应

By default, the API waits until the entire response is generated before sending it. Streaming solves this — you receive tokens as they are generated, enabling the typewriter effect you see on Claude.ai and dramatically improving perceived performance.

Python Streaming

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a detailed explanation of machine learning."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
print()

JavaScript Streaming

const stream = await client.messages.stream({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Write a detailed explanation of machine learning.' }]
});

for await (const chunk of stream.textStream) {
  process.stdout.write(chunk);
}
console.log();

A 500-word response takes roughly 5–8 seconds to generate. Without streaming, users see a blank screen the entire time. With streaming, they start reading within the first second. Total generation time is identical, but user experience is transformed. Streaming is essential for any user-facing application.

工具使用与函数调用

Tool use allows Claude to request data from external systems mid-conversation — databases, APIs, file systems — enabling it to work with real-time information and take actions in the world.

import anthropic, json

client = anthropic.Anthropic()

tools = [{
    "name": "get_weather",
    "description": "Get current weather for a city",
    "input_schema": {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"},
            "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
        },
        "required": ["city"]
    }
}]

def get_weather(city, unit="celsius"):
    return {"temperature": 22, "condition": "sunny", "city": city}

messages = [{"role": "user", "content": "What's the weather in Paris?"}]
response = client.messages.create(
    model="claude-sonnet-4-6", max_tokens=1024, tools=tools, messages=messages
)

if response.stop_reason == "tool_use":
    tool_use = next(b for b in response.content if b.type == "tool_use")
    result = get_weather(**tool_use.input)
    messages.append({"role": "assistant", "content": response.content})
    messages.append({"role": "user", "content": [
        {"type": "tool_result", "tool_use_id": tool_use.id, "content": json.dumps(result)}
    ]})
    final = client.messages.create(
        model="claude-sonnet-4-6", max_tokens=1024, tools=tools, messages=messages
    )
    print(final.content[0].text)

Tool use enables querying databases, calling external APIs, reading and writing files, executing code, and integrating with business systems. Claude decides when to call a tool based on the user's request and the tool's description — the description text is crucial, as Claude reads it to determine relevance.

选择合适的Claude模型

Model selection significantly impacts both cost and quality. Each model has a distinct performance profile suited to specific use cases.

Claude Haiku 4.5 — Fastest and cheapest. Best for: classification, simple Q&A, moderation, data extraction from structured text, high-volume batch processing. Response time under 1 second for short outputs.

Claude Sonnet 4.6 — Best balance of capability and cost. Handles: complex writing, code generation, detailed analysis, multi-step reasoning, customer-facing chat. The right default for most production applications — near-Opus quality at significantly lower cost.

Claude Opus 4.7 — Most capable, 1M token context. Use for: research synthesis across very long documents, complex code architecture, high-stakes writing, and tasks where output quality matters more than cost or latency. Costs roughly 15x more than Haiku — reserve for tasks that genuinely need it.

A practical production strategy: default to Sonnet for all requests, implement a routing layer that upgrades to Opus for requests above a complexity threshold (prompt length, task type, explicit user request). This optimizes cost while ensuring quality where it matters most.

生产环境最佳实践

Handle Errors with Exponential Backoff

import anthropic, time

client = anthropic.Anthropic()

def call_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                messages=messages
            )
        except anthropic.RateLimitError:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)
            else:
                raise
        except anthropic.APIError as e:
            if e.status_code >= 500:
                time.sleep(1)
            else:
                raise

Use System Prompts via the system Parameter

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a concise technical assistant. Respond in plain text, no markdown unless explicitly asked. Keep answers under 200 words.",
    messages=[{"role": "user", "content": user_input}]
)

Track Token Usage from Day One

Log usage.input_tokens and usage.output_tokens from every response. This lets you identify expensive requests, detect prompt injection attempts, and forecast monthly API spend accurately. Token monitoring is much easier to implement from the start than to retrofit later.

Enable Prompt Caching for Repeated Context

For workloads where a large system prompt or context document is reused across many requests, enable prompt caching by adding "cache_control": {"type": "ephemeral"} to the relevant content blocks. Cached tokens cost significantly less than reprocessing the same content repeatedly — a major cost saver for applications with long system prompts.

Implement Rate Limit Handling at the Application Level

Even with retry logic, sustained high-throughput applications need queue-based rate limit management. Implement a token bucket or sliding window rate limiter in your application layer so you never hit the API's rate limits in the first place, rather than relying entirely on retry logic after being rejected.

常见问题解答

Do I need a paid plan to use the Claude API?

New Anthropic accounts receive free credits for initial testing. Beyond those credits, the API is pay-as-you-go — add a payment method and pay only for what you use. There is no required monthly subscription for API access.

What is the difference between the Claude API and Claude.ai?

Claude.ai is the consumer web and mobile interface. The API is for developers building their own applications. They have separate billing. If you want Claude without writing code, FreeClaude provides free Claude Max x20 access through referrals.

How do I handle documents that exceed context limits?

For documents within the context window (up to 1M tokens with Opus 4.7), include the full text in the prompt. For larger collections, use retrieval-augmented generation (RAG): chunk documents, embed them with a vector database, and retrieve only relevant sections for each query.

Can I use the Claude API for commercial applications?

Yes. Anthropic's usage policies permit commercial use subject to their acceptable use guidelines. You may build and sell products powered by the Claude API as long as your application complies with Anthropic's policies and applicable laws.

What programming languages does the Claude API support?

Official SDKs exist for Python and TypeScript/JavaScript. The underlying API is a standard REST API with JSON, so any language that can make HTTP requests works — Ruby, Go, Java, PHP, Rust, and more.

How do I prevent prompt injection attacks?

Defenses include: clear separation between instructions and user content using XML tags, explicit instructions in the system prompt to ignore conflicting content in user input, output validation to detect unexpected format changes, and rate limiting to detect anomalous patterns.

How should I set max_tokens?

Set it based on expected output length plus a safety margin. For chatbot responses, 512–1024 is usually sufficient. For document generation, 4096 or higher. Setting it too low truncates responses; setting it too high costs nothing extra unless Claude actually uses those tokens.

What are the most effective ways to reduce API costs?

Use Haiku for simple tasks, enable prompt caching for repeated context, keep system prompts concise, set max_tokens appropriately, implement request deduplication, cache API responses for identical inputs, and batch non-real-time workloads during off-peak hours.

立即开始构建

The Claude API opens up virtually unlimited possibilities for AI-powered applications. Start with the simple examples in this guide, progressively add complexity — streaming, tool use, multi-turn conversations — and you will have a production-ready integration within days. The key is to build incrementally, measure token usage from the start, and design your prompts thoughtfully.

For hands-on Claude exploration without API overhead, FreeClaude's free access program lets you test Claude Max x20 capabilities directly — invaluable for crafting and testing prompts before moving them into API code.

Get Claude Max x20 for free

Join thousands of users accessing Claude's most powerful tier at no cost through FreeClaude.

Get Started Free →

初学者使用Claude API的常见错误

After working with the Claude API across dozens of projects, these are the mistakes that consistently trip up new developers. Avoiding them from the start saves hours of debugging later.

Not Handling the stop_reason Field

Every API response includes a stop_reason field. The possible values are end_turn (Claude finished naturally), max_tokens (the response was cut off), stop_sequence (a stop sequence was hit), and tool_use (Claude wants to call a tool). Many beginners only handle the happy path and are surprised when responses appear truncated. Always check stop_reason and handle each case explicitly in production code.

Using Synchronous Calls in Async Applications

The Claude API involves network latency and generation time ranging from under a second to tens of seconds. In web applications built with async frameworks (FastAPI, Express, Next.js), blocking the event loop on a synchronous API call freezes request handling for your entire application. Always use the async client in async contexts: anthropic.AsyncAnthropic() in Python, or the await-based methods in JavaScript.

Ignoring the messages Array Role Order

The messages array must alternate between user and assistant roles. Two consecutive user messages or two consecutive assistant messages will cause a 400 error. When reconstructing a conversation from a database, always validate role alternation before sending. If you need to represent multiple pieces of information from the user, combine them into a single user message.

Hardcoding Model Names Throughout the Codebase

Anthropic periodically updates models and deprecates older versions. Centralizing model selection in a configuration file or environment variable means a model version upgrade is a one-line change rather than a search-and-replace across your entire codebase.

团队正在用Claude API构建什么

The Claude API powers a remarkably wide range of applications. Understanding what other teams build helps spark ideas and illustrates the scope of what is possible beyond basic chatbots.

Automated code review systems — Engineering teams integrate Claude into CI/CD pipelines to perform automated code review before human review. Claude checks for security vulnerabilities, identifies potential bugs, ensures code style consistency, and flags missing test coverage. These systems reduce the burden on senior engineers and catch issues that slip through in high-volume PR queues.

Document intelligence platforms — Legal, financial, and compliance teams build tools that process large volumes of documents — contracts, regulatory filings, research reports — and extract structured information, identify key clauses, flag issues, and generate summaries. Claude's large context window combined with structured JSON output makes this category particularly strong.

Customer communication assistants — Support teams deploy Claude as a first-response system that handles routine inquiries automatically, drafts responses for human review, and escalates complex cases. Unlike rigid rule-based bots, Claude handles the natural variation in how customers phrase questions.

Personalized learning platforms — EdTech applications use Claude to build adaptive tutoring systems that respond to each student's specific misconceptions, generate practice problems at the right difficulty, and explain concepts in multiple ways until the student demonstrates understanding.

Research and analysis pipelines — Data teams use Claude in automated pipelines that pull data from various sources, generate analysis, and produce structured reports. Claude's ability to reason about data and explain findings in plain language closes the gap between raw data and actionable insights that business stakeholders can consume directly.