Text Chunker
Drop in a .txt, .md, or .docx file (or paste text) and split it into clean, overlapping chunks sized for any LLM context window, then download them all as a zip. Everything runs in your browser — your file never leaves your device.
What is the Text Chunker?
The Text Chunker is a free, in-browser tool that takes one large piece of writing and breaks it into a series of smaller, evenly sized blocks. You can drop in a file or paste raw text, choose how the split should work, and download every block at once. The main reason people reach for a text chunker today is large language models: tools like ChatGPT, Claude, and embedding APIs only accept a limited amount of text at a time, so you often need to split a long document into smaller blocks sized to fit a language model context window before you can feed it in. This page does exactly that, without sending your content anywhere.
How to use it
- Add your text. Drag and drop a file onto the box, click to browse, or simply paste text straight in. Supported files are plain text (
.txt), Markdown (.md), and Word (.docx). - Pick a splitting mode. Choose by character count to fix the size of each chunk, or by number of chunks to fix how many pieces you get.
- Tune the options. Optionally set an overlap (how much text repeats between neighbouring chunks) and switch on "don't cut sentences/paragraphs" so blocks break on natural boundaries.
- Review and download. The tool shows you how many chunks were produced and how big each one is. Click download and every chunk is packaged into a single
.zipfile on your device.
The whole flow takes a couple of seconds, and you are done — no account, no waiting on a server.
The two splitting methods explained
Split by character count. You set a fixed size, say 2,000 characters, and the tool walks through the text producing as many chunks as needed until it runs out. This is the right choice when you have a hard budget — for example a model that accepts roughly N tokens, where you translate that budget into a character size and let the number of chunks fall out naturally.
Split by target number of chunks. Here you decide how many pieces you want — say exactly 10 — and the tool divides the text as evenly as it can into that many blocks. This is handy when you plan to paste or process the parts one at a time and want a predictable count, regardless of the document's total length.
Why overlap matters. Overlap means each chunk repeats a little of the text from the end of the previous one. When the chunks will be embedded for retrieval-augmented generation (RAG) or fed to an LLM, a sentence or idea that happens to fall on a boundary would otherwise be split across two blocks and lose its meaning in both. A small overlap (commonly 10–20%) keeps that straddling context whole in at least one chunk, which measurably improves retrieval quality. The sentence/paragraph-boundary option complements this by nudging each cut to the nearest natural break so you never slice a word or a clause in half.
Supported formats
.txt— read directly as UTF-8 text..md— Markdown is treated as text; your headings and lists are preserved as written..docx— Word documents are converted to plain text in the browser using mammoth.js, so you can chunk a report or manuscript without exporting it first.- Paste — no file at all; just type or paste into the box.
Examples
- Feeding a long article to ChatGPT. A 24,000-character blog post, split by character count at 6,000 with 10% overlap, yields four overlapping chunks you can paste in sequence so the model keeps continuity across them.
- Preparing a manuscript for embeddings. A 90-page
.docxnovel chapter, split into a target of 50 chunks with sentence boundaries on, produces evenly sized, clause-clean passages ready to embed for a search index. - Splitting a transcript for translation. A meeting transcript split by number of chunks gives you a fixed set of segments you can hand to a translator or a translation model one block at a time.
Common use cases
- Feeding long text to an LLM (ChatGPT, Claude, Gemini) that has a limited context window.
- Building a RAG pipeline where documents must be chunked before embedding.
- Segmented translation or summarisation, where each block is processed independently and stitched back together.
- Working with sensitive material — drafts under NDA, internal documentation, unpublished research — that must not be uploaded to a third-party service.
Why use this one
Most online text splitters either only accept text you paste in, or they
require you to upload your file to a server. This tool does neither: it reads
.docx, .md, and .txt files and
packages the output .zip entirely in your browser. Nothing is
uploaded, there is no sign-up, there is no character cap, and you get two
splitting modes plus overlap and boundary controls in one place. For anyone
chunking confidential text to feed an LLM, the privacy guarantee — your file
never leaves your device — is the whole point.
It belongs to a small, focused text toolkit. Once a chunk is ready, the Tokenizer tells you its GPT and Claude token count so you can confirm it fits the context window; the Character Counter checks a block against a strict character budget; and the Text Formatter tidies a messy document before you split it.
Frequently asked questions
Are my files or text uploaded anywhere?
No. Everything runs locally in your browser. Word files are parsed with mammoth.js and the chunks are packaged with JSZip entirely on your device, so your document is never sent to or stored on any server. That makes the tool safe for unpublished manuscripts, internal docs, contracts, and other sensitive material you want to feed an LLM.
What is the difference between splitting by character count and by number of chunks?
Split by character count fixes the size of each chunk (for example 2,000 characters each) and produces as many chunks as needed — useful when you have a hard context-window budget. Split by target number of chunks fixes how many pieces you get (for example exactly 10) and the tool sizes each one evenly — useful when you want a predictable number of parts to paste or process one by one.
Why would I want overlap between chunks?
Overlap repeats a small amount of text at the boundary between consecutive chunks. For retrieval-augmented generation (RAG) and embeddings, overlap keeps a sentence or idea that straddles a boundary intact in at least one chunk, so the model does not lose context that was cut in half. A common setting is 10 to 20 percent overlap.
Which file formats can I chunk?
Plain text (.txt), Markdown (.md), and Microsoft Word (.docx). You can also paste text directly into the box without any file at all. Word documents are converted to text in the browser before chunking.
Can it avoid cutting a sentence in half?
Yes. Turn on the sentence or paragraph boundary option and the tool will extend or trim each chunk to the nearest sentence or paragraph break instead of slicing mid-word. Chunk sizes then vary slightly around your target, which is usually preferable when the chunks will be read or sent to an LLM.