Unique Word Ratio Analyzer
Paste your text to analyze unique word ratio and vocabulary richness.
Paste your text to analyze unique word ratio and vocabulary richness.
How Does the Formula Work?
The unique word ratio analyzer examines your text to measure vocabulary diversity and detect repetitive content. It calculates the ratio of distinct words to total words, identifies hapax legomena (words appearing exactly once), ranks the most repeated words, and classifies overall vocabulary richness. Writers, editors, content creators, and SEO professionals use this to evaluate the linguistic quality of their text before publishing.
Hapax Ratio = Words appearing once ÷ Total unique words
Richness: Very High (≥80%) | High (≥60%) | Medium (≥40%) | Low (<40%)
Case-insensitive, punctuation stripped, Unicode supported
Example: "the quick brown fox jumps over the lazy dog" → 8/9 = 88.9%
Understanding Vocabulary Richness
Professional writing typically achieves 60 to 80 percent unique word ratio — enough repetition for cohesion, enough variety to maintain interest. Academic papers tend toward 65-75 percent due to necessary technical term repetition. Creative fiction ranges from 70-85 percent. Below 40 percent almost always indicates problematic repetition. Above 90 percent usually means the text is very short, since longer texts naturally repeat articles and prepositions.
Hapax Legomena and Zipf's Law
Hapax legomena — words appearing exactly once — typically constitute 40-60 percent of unique words in quality text. The concept connects to Zipf's law: in any sufficiently large text, word frequency is inversely proportional to rank. The most common English word ("the") appears about 7 percent of the time, while thousands of words appear just once. A high hapax ratio indicates sophisticated vocabulary; below 30 percent suggests the writer relies on a limited word set. This metric is used in computational linguistics for authorship attribution and text classification.
Practical Applications
Content creators evaluate draft quality before publishing. SEO professionals check for keyword stuffing — unnaturally high repetition that search engines penalize. Students verify adequate vocabulary range in essays. Editors identify overused terms via the repeated words table and suggest synonyms. The analysis runs entirely in your browser — your text is never sent to any server, ensuring complete privacy.
Tips & Recommendations
Professional writing typically has 60-80% unique word ratio. Below 40% = too repetitive.
Words used only once show vocabulary depth. 40-60% hapax ratio is typical for quality text.
The table shows your most overused words — replace some with synonyms for variety.
All analysis runs in your browser. Your text is never sent to any server.
Frequently Asked Questions
What is unique word ratio?
The percentage of distinct words in your text. Higher ratio = more diverse vocabulary. 100% means every word is used only once.
What is hapax legomena?
Words that appear exactly once in a text. A high hapax count indicates rich, varied vocabulary.
What ratio indicates good writing?
60-80% is typical for well-written content. Below 40% suggests excessive repetition. Above 90% may indicate very short text.
Does this detect plagiarism?
It measures vocabulary diversity, not plagiarism directly. Low unique ratio can flag low-quality or spun content.
Is the analysis case-sensitive?
No. 'Hello' and 'hello' count as the same word. Punctuation is also stripped.
Recent Calculations
No calculations yet