HomeBlog › Apple Health JSON vs CSV vs XML

Apple Health Data for AI: Why Clean JSON Beats CSV and XML

If you want an AI agent to actually understand your body, the file format you feed it matters as much as the data itself. Here is how XML, CSV, and JSON compare for Apple Health — and why clean, typed JSON over MCP is the format that wins.

Health Export AI Apple Health · AI agents · MCP 9 min read

TL;DR — Apple Health exports as one giant export.xml file. XML is too verbose and unstructured for LLMs; CSV is compact but loses units and nesting; clean JSON is self-describing, typed, and token-efficient, so an AI model reads it correctly the first time. The best setup is JSON delivered live over MCP so the agent queries only the metrics it needs.

The short answer: best format for Apple Health + AI is JSON

For feeding Apple Health data to an AI agent — Claude, Cursor, opencode, or any LLM — JSON is the best format. It is self-describing (each value ships with its key and unit), it maps directly to the structured objects models are trained on, and it stays compact enough to avoid burning your context window on syntax. XML is the format Apple actually exports, but it is verbose and awkward for models. CSV is fine for one flat table but falls apart the moment your data has units, nesting, or many metric types — which Apple Health always does.

The rest of this article shows exactly why, with side-by-side snippets of the same heart-rate variability reading in all three formats, a full comparison table, and how health-export-mcp turns 190 metrics into clean JSON your agent can query on demand.

What Apple Health actually gives you: export.xml

When you open the Health app, tap your profile picture, and choose Export All Health Data, iOS hands you a zip containing a single export.xml (plus a clinical export_cda.xml). That one file holds everything: every step count, every heart-rate sample, every sleep stage, every workout — interleaved, in HealthKit's XML schema. On an active Apple Watch user it can run to hundreds of megabytes and millions of rows.

Here is a single HRV reading as it appears inside that file:

Apple Health export.xml — one HRV sample

<Record type="HKQuantityTypeIdentifierHeartRateVariabilitySDNN"
        sourceName="Apple Watch" sourceVersion="10.4"
        unit="ms" creationDate="2026-06-26 07:14:02 +0100"
        startDate="2026-06-26 07:14:02 +0100"
        endDate="2026-06-26 07:14:02 +0100" value="61"/>

Every single reading repeats HKQuantityTypeIdentifierHeartRateVariabilitySDNN and the full attribute set. Multiply that by millions of records and you have a file that is enormous, slow to parse, and — critically — full of HealthKit-specific tag names a language model was never explicitly taught. You can drop a chunk of it into a chat, but you will waste thousands of tokens on angle brackets and identifier strings before the model reaches a single useful number.

The same reading in CSV and JSON

Most people convert export.xml into something friendlier. CSV is the classic choice for spreadsheets; JSON is the choice for code and AI. Here is that identical HRV reading in each.

CSV — flat, compact, but context lives in the header row

metric,unit,value,start,source
hrv_sdnn,ms,61,2026-06-26T07:14:02+01:00,Apple Watch

JSON — self-describing: the unit and meaning travel with the value

{
  "metric": "hrv_sdnn",
  "unit": "ms",
  "value": 61,
  "start": "2026-06-26T07:14:02+01:00",
  "source": "Apple Watch"
}

CSV is smaller, but the value 61 only means anything if the header row is still attached and the model lines up the columns correctly. Paste 200 rows without the header — easy to do mid-conversation — and 61 becomes a naked number with no unit. JSON carries "unit": "ms" on every record, so an agent never has to guess. When you have 190 different metrics, some scalar and some nested (sleep stages, workout splits, blood-pressure pairs), that self-describing property is the whole ballgame.

XML vs CSV vs JSON for Apple Health and AI — full comparison

Here is the head-to-head across the five dimensions that decide whether an AI agent can actually use your health data.

Apple Health export formats compared for AI / LLM use
Criterion XML (export.xml) CSV JSON
LLM-readability Poor. Verbose tags, HealthKit-specific schema, huge token cost. OK. Clean for one table, but meaning depends on the header row staying attached. Best. Maps to the structured objects models are trained on; keys explain themselves.
Schema & structure Rigid, nested, but undocumented to the model. One flat record type repeated. Flat only. No native nesting — sleep stages or workout splits don't fit. Nesting + arrays are native. Scalars and complex metrics coexist cleanly.
File size / tokens Largest. Attribute names repeat on every record; hundreds of MB possible. Smallest. No syntax overhead beyond commas. Compact when keys are short; far smaller than XML, near CSV with structure intact.
Tooling & APIs Needs an XML parser; awkward in most AI/code pipelines. Universal in spreadsheets; weak for typed/nested data. Native to every language, every web API, and MCP tool calls.
Privacy / control One giant blob — all-or-nothing; hard to share just one metric. Per-file, but you hand over whole tables at a time. Field-level. Easy to scope to a date range or single metric — ideal over local MCP.

The pattern is consistent: XML is what you're given, CSV is what spreadsheets want, and JSON is what an AI agent wants. The only column where CSV edges ahead is raw file size — and JSON closes that gap completely once you stop dumping whole files and start querying the exact metrics you need.

Why self-describing data matters so much for LLMs

Large language models reason over tokens, and they reason best over data whose meaning is explicit. Three properties of clean JSON line up perfectly with how models work:

CSV can fake the first property with a header row, but loses it the instant rows get separated from that header — which happens constantly when data is pasted, truncated, or windowed. XML has all the structure but pays for it in verbosity and an unfamiliar schema. JSON is the sweet spot: structured enough to be unambiguous, compact enough to be cheap, and familiar enough that models handle it natively.

The real upgrade: don't ship files at all — query JSON over MCP

Even perfect JSON has a ceiling if you paste whole files into a chat. The better pattern is to let the agent pull exactly the JSON it needs, when it needs it. That is what the Model Context Protocol (MCP) is for, and it is how health-export-mcp works.

Instead of a 200 MB blob, your agent gets typed tools. Ask Claude "How did my HRV trend this week versus last?" and it calls a tool that returns a small, clean JSON object scoped to exactly that window:

What the agent receives over MCP — scoped, typed, ready to reason about

{
  "metric": "hrv_sdnn",
  "unit": "ms",
  "window": "7d",
  "summary": { "avg": 58.4, "min": 41, "max": 72, "trend": "+4.4%" },
  "daily": [
    { "date": "2026-06-20", "avg": 55 },
    { "date": "2026-06-21", "avg": 57 },
    { "date": "2026-06-22", "avg": 61 }
  ]
}

That is the whole argument for clean JSON, taken to its conclusion: self-describing, typed, scoped to the question, and delivered live. No file conversion, no header-row roulette, no token budget blown on XML tags. Health Export AI reads 190 HealthKit metrics read-only on your iPhone, writes clean JSON to a destination you choose — iCloud Drive, a local folder, your LAN, or a webhook — and exposes it to your agent over MCP. The data never passes through a server the developer controls. For a step-by-step setup with Claude specifically, see connecting Apple Health to Claude via MCP, and for the full picture of getting your data out, see the pillar guide on exporting Apple Health data to JSON.

When CSV or XML still make sense

JSON winning for AI doesn't make the others useless:

Give your agent clean Apple Health JSON

190 HealthKit metrics, exported as clean JSON your AI agent can query live over MCP. Read-only, local-first, no accounts.

Frequently asked questions

What format is Apple Health data exported in by default?

The built-in Health app exports a single zipped export.xml file in Apple's HealthKit XML schema, plus an export_cda.xml clinical document. It is one large, deeply nested XML file with no per-metric structure, so most people convert it to CSV or JSON before doing anything useful with it.

Is JSON or CSV better for feeding Apple Health data to an AI?

JSON, because it is self-describing — keys, units, and nesting travel with every value, so a model knows that 61 means HRV in milliseconds without a separate header row. CSV is compact and great for a single flat table, but it loses units and structure and breaks when a metric has nested or repeated fields. For multi-metric health data going to an LLM, clean JSON wins.

Why is Apple Health export.xml hard for AI models to use?

It is a single file that can run to hundreds of megabytes, mixes every record type together, repeats verbose attribute names on every row, and uses a HealthKit-specific schema the model was never explicitly told about. The verbosity wastes tokens and the lack of clean per-metric structure makes reliable extraction error-prone. Converting to compact, typed JSON removes all three problems.

What is the most private way to send Apple Health data to an AI agent?

Keep it local and read-only. Health Export AI reads HealthKit read-only on your iPhone, writes clean JSON to a destination you choose — iCloud Drive, a local folder, your LAN, or a webhook — and exposes it to your agent over MCP. Your health data never passes through a server operated by the developer.

Health Export AI · Apple Health → your AI agent, privately. · Home · Privacy · Support