Anatomy of a WhatsApp chat export: every file, what it contains, what to do with it | ThreadRecap
When you tap "Export chat" in WhatsApp and choose to include media, the app hands you a single ZIP file. Most people open it, see a wall of unfamiliar filenames, and close it again. That is a shame, because each file type in that archive represents a distinct layer of information: the written record, the spoken word, the visual context, and the shared documents. This guide walks through every file you are likely to find, explains what it contains, and shows you which parts ThreadRecap can turn into structured output.
_chat.txt: the conversation log
The centrepiece of any export is `_chat.txt`. It is a plain-text file where every message occupies one or more lines in the format:
```
[DD/MM/YYYY, HH:MM:SS] Sender Name: message body
```
A few things are worth knowing about this file:
Timestamps reflect the local device time at the moment of sending. If participants are in different time zones, the log will mix offsets unless WhatsApp normalises them on export (behaviour varies by platform version).
Media references appear as inline placeholders, for example `IMG-20240315-WA0002.jpg (file attached)`, rather than embedded data. The actual file sits separately in the ZIP.
System events such as missed calls, group membership changes, and encryption notices appear as timestamped lines with no sender name.
Message edits and deletions may appear as `<This message was edited>` or `<This message was deleted>`, depending on the WhatsApp version that produced the export.
For analysis purposes, `_chat.txt` is the backbone. Every ThreadRecap output, from meeting recaps to conflict timelines, is anchored to the timestamps and attribution in this file.
How big can _chat.txt get?
A busy group chat running for a year can easily produce tens of thousands of lines. ThreadRecap processes exports containing 60,000 or more messages, so even the most active team channels or long-running family groups fall within capacity.
.opus and .m4a: voice notes
Voice notes are the most information-dense content in any WhatsApp chat, and also the most overlooked in manual reviews. WhatsApp encodes them differently depending on the platform:
Platform
Container
Typical codec
Android
.opus
Opus
iOS
.m4a
AAC
Both formats are compressed audio. The filenames follow WhatsApp's media naming convention: `PTT-YYYYMMDD-WA000X.opus` or `PTT-YYYYMMDD-WA000X.m4a`, where PTT stands for push-to-talk.
ThreadRecap transcribes every voice note in an export using advanced transcription technology, aiming for high accuracy on clear audio. The transcripts are time-linked to the surrounding chat messages, so you can read a conversation as a continuous thread rather than switching between text and audio players.
Background noise, overlapping speakers, heavy accents, and very short clips (under two seconds) are the main factors that reduce transcription quality. Whisper handles multilingual audio, so switching languages mid-conversation does not break the pipeline, though accuracy varies by language.
.jpg, .png, .mp4: shared media
Images and videos in a WhatsApp export follow the naming pattern `IMG-YYYYMMDD-WA000X.jpg` or `VID-YYYYMMDD-WA000X.mp4`. The date component reflects when the file was created or sent, and the trailing index distinguishes multiple files from the same day.
These files carry more metadata than is visible at first glance. EXIF data embedded in `.jpg` and `.png` files can include GPS coordinates, device model, and the original capture timestamp, information that can be significant in dispute or compliance contexts.
ThreadRecap does not process photos, videos, or documents. They are referenced in `_chat.txt` by filename, which ThreadRecap records in the conversation timeline, but the files themselves are never uploaded. This is a deliberate privacy boundary covered in more detail in the section below.
.pdf, .vcf, .docx: documents and contacts
Documents (.pdf, .docx, and other formats)
Any file shared as an attachment in WhatsApp, including PDFs, Word documents, spreadsheets, and presentations, appears in the export ZIP under its original filename. These files are referenced in `_chat.txt` the same way images are: as a placeholder line noting the filename and the phrase "file attached."
ThreadRecap does not upload or parse document attachments. Their presence in the timeline is noted, but their contents are not extracted.
Contact cards (.vcf)
When a WhatsApp user shares a contact, the export includes a `.vcf` file (vCard format). vCard is a standard format for contact information, containing fields such as name, phone number, email address, and organisation. The filename is typically the contact's display name with a `.vcf` extension.
Contact cards are not processed by ThreadRecap. Like documents and media, they remain on your device.
What ThreadRecap reads vs ignores by default
The table below summarises the processing boundary clearly:
File type
ThreadRecap processes?
Where it stays
`_chat.txt`
Yes, fully
Encrypted in your account
`.opus` / `.m4a` (voice notes)
Yes, transcribed via Whisper
Encrypted in your account
`.jpg` / `.png` (images)
No
Your device only
`.mp4` / `.mov` (video)
No
Your device only
`.pdf` / `.docx` (documents)
No
Your device only
`.vcf` (contact cards)
No
Your device only
The outputs ThreadRecap generates from the processed data include:
Meeting Recap: a structured summary of what was discussed and agreed
Action Items: tasks extracted with assignee and deadline where stated
Decisions: explicit or implied decisions logged with context
Conflict Resolution: a timeline of disputed exchanges with attributed statements
Relationship Insights: communication pattern analysis across participants
All of these are grounded in the text and transcribed voice content. Nothing is inferred from images or documents.
Why photos and videos never leave your device
This is not a limitation, it is a design choice rooted in the sensitivity of media files.
Photos and videos shared in personal or professional chats often contain information that goes far beyond the image itself: location data, faces, documents photographed on desks, and timestamps tied to specific events. Uploading this material to any cloud service, even an encrypted one, creates exposure that many users, and many legal and compliance teams, are not comfortable with.
ThreadRecap's architecture keeps a hard boundary here. The export-and-upload workflow means you own the ZIP file before anything is sent. When you upload to ThreadRecap via /upload, only `_chat.txt` and voice note audio cross the network. Everything else stays in the ZIP on your device.
Chat text and voice note audio are stored encrypted in your account. You can delete them at any time through the dashboard. There is no retention period that overrides your choice.
The structured nature of `_chat.txt` makes WhatsApp exports useful in legal, HR, and compliance contexts. Each message carries a timestamp and a sender attribution that is hard to alter without disrupting the surrounding log. Voice note transcripts add a layer of spoken evidence that is often absent from message-only records.
ThreadRecap's evidence-ready output formats present this material as a chronological, attributed record with clear separation between what was written and what was spoken. If you are preparing for a dispute, an internal investigation, or a regulatory review, the structured report gives you a starting point that is far easier to navigate than a raw text file with thousands of lines.
A note on admissibility: the raw export and ThreadRecap's structured output can support legal work, but formal admissibility depends on jurisdiction and authentication procedures. Always consult a qualified legal professional before relying on any chat export in proceedings.
Understanding the full picture
A WhatsApp export ZIP is not just a backup. It is a layered archive where each file type captures a different dimension of communication: the written record in `_chat.txt`, the spoken word in voice note audio, the visual context in images and video, and the shared materials in documents and contact cards.
ThreadRecap works with the layers that can be analysed at scale without compromising the privacy of the layers that cannot. If you want to understand what your export contains before you do anything with it, the file breakdown above is your map. If you are ready to turn it into structured output, the /upload page is the next step.