Back to Foundations

How modern AI actually works.

Five units. One real workflow. You'll use AI Agent before you finish reading.

Free Audience: Curious newcomers Prereq: None Time: 25–35 min Units: 5
By the end of this path

You will have:

  • A working mental model for how modern AI systems differ from classic software
  • Used AI Agent once, with intent, on a real task of your own
  • Learned to read evidence tiers (measured, estimated, modeled, roadmap)
  • Saved an output you actually care about

No certificates. No quizzes. Just a rhythm you can use tomorrow.

Unit 01 · Concept

What "AI" actually means in 2026

Time: 4 min Objective: Distinguish chat, assistant, and agent

Modern AI is not search. It's not a database. It's not a script that runs the same way every time. That's the thing most people miss on the first try.

When you ask Google for the weather, Google retrieves the weather. Same question, same answer. When you ask an AI system to draft an email, it generates one — and if you ask the exact same question again tomorrow, you might get a different draft. Not because something broke. Because that's how these systems work.

The shift is from retrieval (looking up what already exists) to generation (producing something new in the moment).

There are three patterns people run into, and they are not the same thing.

Chat is the simplest pattern. You ask a question, the system replies. The whole interaction fits in one exchange. Nothing persists. Nothing happens in the world beyond the reply. Most people's first experience with AI is a chat interface, which is why most people think "AI" and "chatbot" are the same thing. They aren't.

Assistant is chat with memory. The system remembers what you talked about earlier in the session, sometimes across sessions. It can reference things you've said before. It can hold context about a longer project. The same interaction with a chat and an assistant can produce different outputs because the assistant has more to work with.

Agent is an assistant that can take actions. Not just reply — actually do things. Run a search. Read a file. Call an API. Draft an email and then send it if you approve. An agent has tools, not just words.

One more thing that's easy to miss: these systems are probabilistic. Classic software is deterministic — given the same inputs, you get the same outputs, every time. Modern AI systems don't work that way. They sample from a distribution. They make choices. Two runs of the same prompt can return different results. This is not a bug. It's the design. It's also why guardrails, governance, and verification matter in a way they didn't matter for classic software.

If you remember one thing from Unit 1: modern AI generates, it doesn't retrieve, and it's probabilistic, not deterministic. Everything else in this path follows from that.

Action · Unit 1

Pick the statement that describes modern AI.

Read the three statements below. Pick the one that best describes modern AI, and briefly say why the other two are wrong.

  1. AI is a smarter version of search — it finds the right answer faster.
  2. AI is a script that runs predictably when you give it the same input.
  3. AI is a system that generates new responses probabilistically from patterns in training data.
Common wrong turns
  • Thinking "chat" and "AI" are synonyms. Chat is the simplest mode; AI is the broader family.
  • Expecting deterministic output. When the same prompt returns different text, people assume something is broken. It isn't.
  • Confusing tools with knowledge. An agent with tools is more capable than an assistant without — but only for tasks where the tools actually help.
Unit 02 · Mental model

Memory, context, and tools

Time: 5 min Objective: Match tasks to the right combination

Once you accept that modern AI generates rather than retrieves, the next question is: what makes one system more useful than another? The answer turns out to be surprisingly concrete. Three things.

Memory is what the system remembers. Some systems forget everything the moment you close the tab. Others hold onto what you said earlier in the conversation. Better ones remember you across sessions — your preferences, your projects, the tone you like, the work you're doing this week. The more memory, the more the system starts to feel like a collaborator instead of a search box. But memory also means trust. Everything the system remembers is something you're implicitly giving it, and something you might want to take back later. Good systems make memory visible, editable, and delete-able.

Context is what the system knows about the current task. It's different from memory. Memory is persistent. Context is what you pack into the current interaction: the document you pasted in, the file you attached, the instructions you set at the start of the session, the role you asked the system to play. Context is how you tell the system what this particular task is about. Two users with identical memory can get wildly different results from the same prompt if one of them provides rich context and the other doesn't.

Tools are what the system can do beyond generating text. Running a search. Reading a file on your computer. Executing code. Calling an API. Sending a message. Creating a calendar event. A tool-less system can only talk. A tool-equipped system can take actions in the world. This is the single biggest difference between a chatbot and an agent — chatbots talk, agents act.

The trap people fall into is assuming more is always better. It isn't. A personal reminder doesn't need tools — it just needs memory. A one-off research question doesn't need memory — it just needs context. An API integration doesn't need rich context — it needs the right tool. Matching the right combination to the task is the skill.

Task Memory? Context? Tools?
Quick factual questionNoLightNo
Multi-session research projectYesYesSometimes
Personal reminder systemYesLightNo
"Send this email if X is true"LightYesYes
Recurring automationYesYesYes

Memorize the table. Or don't — you'll internalize it by Unit 5.

Action · Unit 2

Match the tasks to the right combination.

For each task, mark which of the three elements (memory / context / tools) it actually needs.

  1. "Summarize this 40-page document I just pasted in."
  2. "Remember I prefer bullet points and never start summaries with 'In summary.'"
  3. "Check if any of the three APIs I use are reporting errors and send me a Slack message if one is."
Common wrong turns
  • Assuming memory and context are the same thing. Memory persists. Context is packed per task.
  • Thinking tools are always better. Tools add capability and surface area — both useful and risky.
  • Skipping context because memory exists. Memory tells the system about you. Context tells the system about the task. You need both.
Unit 03 · Failure modes

What can go wrong

Time: 5 min Objective: Identify three failure modes and their fixes

If modern AI is probabilistic and it acts in the world, things will go wrong. Not "maybe." Will. The question is not whether to trust these systems — it's knowing what kind of failure you're looking at when it happens, because different failures need different fixes.

Three failure modes cover almost everything.

Hallucination is when the system confidently produces something that isn't true. It will cite a paper that doesn't exist, quote a person who never said that, or invent a function name that isn't in the library. Hallucinations are not lies — the system isn't trying to deceive you. They're the natural result of a system that's generating rather than retrieving, running up against the edge of what it actually knows. The fix for hallucination is verification: cross-check, require citations, use tools that can ground the answer in a real source, and — critically — never trust a confident-sounding answer just because it sounds confident.

Drift is when the system's behavior changes over time in ways you didn't expect. Maybe the model was updated. Maybe your context window filled up and the system started forgetting things. Maybe the tool you gave it stopped working the way it used to. Drift is insidious because the system doesn't announce it — the output just slowly gets worse, and one day you notice. The fix for drift is observability: watching what the system does, keeping a record, and catching the change before it becomes a crisis.

Governance failure is when the system does something it shouldn't have been allowed to do in the first place. It sends an email without approval. It deletes a file because the user's prompt was ambiguous. It spends money on an API call nobody authorized. Governance failures are rarely about the model being "wrong" — the model did exactly what it was told. The failure is upstream, in the permissions, the boundaries, the approval gates, the kill switches. The fix for governance failure is policy: clear rules about what the system can and can't do, enforced before it runs, not after.

Here's the question most people get wrong: "Is this AI safe?" The honest answer is that safety is not a property of the model. It's a property of how the whole system around the model is designed. A capable model with no governance is dangerous. A weak model with strong governance can be fine for most things. The model is one part of a larger system, and that larger system is what ALCUB3 cares about.

One last thing. When a good source says a claim is measured, it means someone actually checked it against ground truth. When a claim is estimated, it means we have reasonable math but we haven't verified every case. When a claim is modeled, it means a model produced the answer and we can't fully defend every specific instance. When a claim is roadmap, it means we're telling you what we intend to build, not what exists today. That's the vocabulary of Unit 4. It's how ALCUB3 tells you how much to trust any given claim.

Action · Unit 3

Classify the failure, then name the fix.

A user asks their AI assistant to draft a brief summary of a 2023 paper on water quality monitoring. The assistant replies with a confident paragraph, complete with author names, publication year, and a quote. The user searches for the paper and can't find it anywhere. No such paper exists. The authors don't exist either.

Pick one: hallucination, drift, or governance failure. Then explain in one sentence what the right safeguard would be.

Common wrong turns
  • Calling all failures "hallucinations." Drift and governance failures are distinct and need different fixes.
  • Assuming the model is "wrong" when the real failure is governance. If the user told the system to do it, the model is doing its job. The policy layer is what was missing.
  • Treating safety as a model property. Safety lives in the whole system.
Unit 04 · Trust vocabulary

Reading evidence tiers

Time: 6 min Objective: Label claims as measured, estimated, modeled, or roadmap

Most AI products make claims. "Our model is 95% accurate." "Our system saves you X hours per week." "Our platform supports Y." Some of those claims are solid. Some are defensible but unverified. Some are model output presented as fact. And some are roadmap items written in the present tense.

They do not belong in the same sentence, and the difference matters enormously when you're trying to decide whether to trust something.

ALCUB3 uses four tiers. They're short on purpose — any longer and nobody uses them.

measured  Someone collected real data, compared it against ground truth, and recorded the result. If a water quality score is measured, it means the input data came from an instrument or a lab test. If a benchmark is measured, it means the model ran on real test cases and we recorded the scores. Measured claims are the strongest, but they're also the narrowest — you can only measure what you can actually observe.

estimated  Reasonable math produced this number, but nobody ran the ground-truth check on every case. A national AI water footprint figure is estimated: we have models for energy use and water usage per model, we multiply them together, the number is probably close but it's not the result of measuring every query. Estimated claims are what good engineering looks like when measurement isn't feasible.

modeled  A machine learning model produced this output. The model was trained, it was validated, it performs within a known error range — but any specific instance is still a prediction, not a fact. A satellite-based water body segmentation is modeled. A climate prediction is modeled. An AI's draft summary is modeled. You can trust modeled claims in aggregate and you should be careful about them individually.

roadmap  We're going to build this. It doesn't exist today. If a product page says "supports groundwater health scoring" and the tier is roadmap, that means the feature is planned, not shipped. This is the tier startups abuse most, which is exactly why having it labeled explicitly is useful.

The reason conflating these tiers is a problem: a reader who assumes every claim is measured will make decisions on claims that aren't actually measured. That is not pedantic. It's how trust gets eroded. The whole point of labeling tiers explicitly is that you don't have to guess — the tier is printed next to the claim.

If you ever see an ALCUB3 product card, methodology note, or research report that mentions a tier, you now know what it means. You also know that claims without a tier label should be treated with skepticism — the absence of a tier is itself a signal.

Action · Unit 4

Label the four claims.

Below are four real claim cards drawn from the ALCUB3 Impact methodology page. For each, label it with the correct tier.

  1. "Your ZIP code has 3 EPA water quality violations on record in the last 12 months."
  2. "AI queries to GPT-4 in California consume approximately 500mL of water per 10 medium-length queries."
  3. "Your area's PFAS contamination risk is low."
  4. "We will support portfolio-level water risk scoring for institutional users."
Common wrong turns
  • Labeling everything measured because it sounds authoritative. Most claims are not measured. That's fine — as long as they're labeled honestly.
  • Treating modeled as "the same as measured, just from a model." A modeled claim is a prediction. Predictions have error bars.
  • Ignoring the roadmap tier when it shows up. Roadmap is a promise, not a product. Treat it accordingly.
Unit 05 · Hands-on

Your first real workflow with AI Agent

Time: 10 min Objective: Use AI Agent once, with intent, on a real task

Everything up to now has been preparation. This unit is the point of the path.

You're going to use AI Agent once. Not for a toy question. For something you actually need done. Pick a small, real task — something you would have done yourself in the next day or two anyway. A research summary, a draft email, a comparison between two options, a rewrite of something you wrote that isn't quite right. Small is fine. Real is the part that matters.

Here's why. People who start with toy prompts never learn what AI is good for. They learn what AI is good at being asked toy questions about, which is not a useful skill. The only way to find out whether AI Agent is actually useful for your life is to give it a real task and see what happens.

Before you open AI Agent, do one minute of prep. You're going to apply the three things from Unit 2 on purpose.

Context — What does AI Agent need to know about this specific task? Paste in the source material, state the goal, specify the audience, describe the constraints. Context is the difference between a mediocre response and a useful one, and most people skip this step.

Memory — Are you setting up a pattern you'll use again? Tell AI Agent what you prefer (format, length, tone). Memory pays off later if you use this again.

Tools — Does this task need a tool beyond text? If you need the system to search, read a file, or do math, name the tool explicitly in the prompt.

Now write the prompt. Keep it specific. "Help me with this" is not a prompt. "Summarize this document in three bullet points for a technical audience, keeping the original numbers intact, and flag anything that looks uncertain" is a prompt.

Send it. Read the response. Here's the part that matters: the response is a modeled claim. You learned this vocabulary in Unit 4. AI Agent is producing a prediction. It's not retrieving. It's not infallible. Read it the way you'd read a first draft from a colleague who's smart but sometimes gets things wrong — interested, skeptical, willing to use the good parts.

If the response is good, save it to your workspace. If it's not, revise the prompt and try again. Revising is part of the workflow, not a failure. Two or three iterations is normal. Ten is a sign that the task wasn't the right fit for a single AI Agent run.

You now have one real piece of AI-assisted work to show for this path. That's the artifact. That's the point.

Action · Unit 5 · Hands-on

Open AI Agent and run one real task.

  1. Pick a real task from your own work — something small, something you would have done in the next 48 hours anyway.
  2. Apply context, memory, and tools deliberately.
  3. Write the prompt. Keep it specific.
  4. Read the response knowing it is a modeled output.
  5. Iterate if needed. Two or three revisions is normal.
  6. Save the final output to your workspace.

Launch AI Agent

Common wrong turns
  • Picking a toy task to "keep it safe." You learn nothing.
  • Skipping context because typing feels like work. Context is the single biggest determinant of output quality.
  • Accepting the first response without reading it critically. Modeled outputs need critical reading.
  • Iterating forever. If three revisions don't get you there, the task probably needs to be decomposed or delegated to a different mode.

You just used an AI system with intent.

Ready to keep going? The next path teaches mode selection, governance cost, and how to choose the right ALCUB3 product for the job.

Start Agents vs Workflows Keep using AI Agent