Probabilistic Systems: Why LLMs Do Not Act Like Code

Section 1

LLMs Are Samplers

Compare the options

Parameter	Effect
temperature	Scales logits. 0 = greedy, higher = more random
top_p (nucleus)	Keep smallest set of tokens whose cumulative probability exceeds p
top_k	Keep only the top k tokens
repetition_penalty	Down-weight tokens already in context
seed	Pin the pseudo-random generator for reproducibility

Sample multiple times and reason statistically. One call is anecdote, ten calls are data.

python

# Robust evaluation of a probabilistic system
import statistics
import anthropic

client = anthropic.Anthropic()

def evaluate(prompt, n=10):
    outputs = []
    for _ in range(n):
        resp = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=200,
            messages=[{"role": "user", "content": prompt}],
        )
        outputs.append(resp.content[0].text)
    # Measure pass@k, agreement, or similar
    return outputs

results = evaluate("Classify: 'great product'", n=10)
print(f"Unique outputs: {len(set(results))}")

Key terms in this lesson

Probabilistic Systems: Why LLMs Do Not Act Like Code

LLMs Are Samplers

The sampling knobs

Why this breaks classical software instincts

Strategies for taming randomness

Testing probabilistic systems

Curious about “Probabilistic Systems: Why LLMs Do Not Act Like Code”?

Keep going

Probabilistic Systems: Why LLMs Do Not Act Like Code

LLMs Are Samplers

The sampling knobs

Why this breaks classical software instincts

Strategies for taming randomness

Testing probabilistic systems

Curious about “Probabilistic Systems: Why LLMs Do Not Act Like Code”?

Keep going