Claude 3 vs GPT-4: a practitioner's comparison for business use

Anthropic released the Claude 3 model family on March 4th, 2024: Haiku, Sonnet, and Opus, positioned at different capability and cost tiers. Opus is positioned as their most capable model, benchmarking competitively with GPT-4 on a range of standard evaluations.

I've been running both alongside each other for real work tasks for the past few weeks. Here's how it breaks down.

Quick context on where I'm coming from

I'm not an AI researcher. I'm an IT professional who uses these tools for: writing and editing, PowerShell and Python scripting, technical research and summarisation, customer communication drafting, and general problem-solving. My comparison is practical, not academic.

Where Claude 3 Opus has an edge

Nuance in writing tasks

For anything requiring careful tone (a difficult email, a proposal that needs to land precisely right, communication that needs to balance honesty with diplomacy), I've found Opus marginally better than GPT-4. It seems to better pick up on implicit requirements in how I describe a task.

Long document handling

Claude 3's context window is 200,000 tokens for Opus. GPT-4 Turbo has 128,000 tokens. In practice, both are more than sufficient for most tasks, but for genuinely long documents (full product specifications, lengthy contracts, extended code reviews), Claude's additional headroom has mattered a few times.

Following complex instructions

When I give Claude a prompt with multiple constraints ("write this at this length, in this tone, for this audience, avoiding these topics, structured like this"), it tends to adhere to the full set of constraints more consistently than GPT-4, which sometimes drops one of several requirements.

Where GPT-4 has an edge

Code generation

For scripting tasks (PowerShell, Python, Graph API calls), GPT-4 remains my preference. The output tends to be closer to idiomatic, production-ready code. Claude's code output is good, but occasionally over-verbose or structured in ways that feel slightly academic.

Plugin and tool ecosystem

ChatGPT with GPT-4 has a broader plugin ecosystem and code interpreter, which adds useful capability for certain tasks. Claude.ai's interface is clean but more limited in these integrations.

Familiarity and predictability

I've been using GPT-4 for over a year. I know how to prompt it. I know where it struggles. That familiarity has real value in a working context.

Where I've ended up

These models are genuinely close in capability for the kind of work I do. The gap between them is smaller than the gap between either of them and GPT-3.5 was. Choosing between them for a specific use case is a matter of marginal differences, not transformative ones.

My current practice: I use GPT-4 as my default, Claude 3 Sonnet for writing-heavy tasks and long document work. For scripting, GPT-4 still has my preference.

If you haven't tried Claude 3 and you're a regular GPT-4 user, it's worth at least a few weeks of parallel testing. The differences are real, even if they're not dramatic. And the model landscape is moving fast enough that "I'm settled on X" is a position you should revisit regularly.

Claude 3 vs GPT-4: a practitioner's comparison for business use

Quick context on where I'm coming from

Where Claude 3 Opus has an edge

Where GPT-4 has an edge

Where I've ended up

Related posts

Building my first AI-powered app: what I learned as a non-ML developer

Claude 3.7 and the rise of agentic AI — this is the inflection point

DeepSeek just changed the economics of AI. What it means for enterprise