Measuring and Reducing the Carbon Footprint of AI Interactions
Every word you read from an AI has an environmental cost measured in electricity, water, and carbon emissions. By making these invisible costs visible, we can empower users to make more sustainable choices.
How One Token Becomes Electricity
A single token is roughly four English characters. A single NVIDIA A100 GPU burns 0.4 kW while it spins. At 1,400 tokens per second that is 0.00029 kWh per token, or the same energy a 10 W LED needs for 104 seconds anuragsridharan.substack.com.
Multiply by the 100-plus billion tokens served each day across the big providers and the stack stops looking trivial.
Cooling adds another ten percent. Google’s data centres run at a Power Usage Effectiveness of 1.1, so every kilowatt-hour of compute needs an extra 0.1 kWh for chillers and fans anuragsridharan.substack.com. Water enters the picture here: evaporative cooling can gulp four litres per kilowatt-hour on a hot day. A long chat session can therefore consume more water than the can of soda beside your keyboard.
Carbon follows the same math. The U.S. grid still emits about 0.39 kg CO₂ per kWh. One thousand tokens on an A100 therefore release 0.11 g CO₂—about the same as driving a Model 3 one meter.
From Guesswork to a Pocket Calculator
Researchers at several universities released the HCI GenAI CO₂ST Calculator in April 2025. You enter model size, token count, and the data-centre region; the sheet returns kilowatt-hours, litres of water, and grams of CO₂ arxiv.org. The tool works before you run an experiment or after, so labs can redesign studies or at least publish honest footprints.
Ellydee mirrors the idea inside every chat window. Our impact badge shows:
- kWh used
- Litres of water
- km you would have driven in a 2020 combustion-engine car
The numbers update as you type. Users who switch the assistant to eco-mode watch the litres drop in real time because we route short prompts to a 1.3B-parameter distilled model that needs one-sixth the GPU time. Early trials cut energy per conversation by 52% with no drop in answer quality.
Which Models Cost What?
Model family | Billion params | Tokens per sec (1×A100) | Wh per 1 k tokens |
---|---|---|---|
GPT-3 class | 175 | 1400 | 0.29 |
Llama-3-70B | 70 | 1100 | 0.36 |
Distilled-1B | 1.3 | 8000 | 0.05 |
On-device 0.3 B | 0.3 | 12000 | 0.03 |
Water tracks the same curve. A 70-billion-parameter run needs roughly six times the cooling flow of the 1-billion sibling.
What You Can Do Today
- Ask once, read twice. Re-prompting burns the same energy as a fresh prompt. Edit your question until it is tight.
- Choose lighter models when speed beats depth. The badge lets you flip to the small model for quick lookups.
- Prefer providers that publish hourly grid carbon intensity and schedule heavy jobs when renewables peak.
- Run repeated tasks on-device if your phone or laptop can load a quantized 3-billion-parameter model; the kilowatt-hours then come from your battery, not a warehouse full of GPUs.
- Cache answers inside your app instead of re-querying for every scroll.
Industry Must Label Digital Calories
Energy and emissions data are still voluntary. Few companies list watt-hours per query, and none show water. The practice hides external costs from customers and slows investment in greener chips. Mandatory disclosure—like the nutritional panel on food—would let users compare models the same way they compare price or speed. Until that happens, tools such as the CO₂ST calculator and Ellydee’s live badge give individuals a stop-gap ruler.
Transparency is not a gimmick. It is the shortest path to pressure, and pressure drives efficiency. The AI giants squeezed a 10 000× speed-up out of transformers in six years once the benchmarks became public. Expose energy and water to the same spotlight and the next six years can deliver a carbon drop just as steep.
So the next time you see a friendly AI paragraph, remember: somewhere a turbine spun, a river cooled, and a carbon ledger ticked. Every token counts, and every user can count them.