|
Corpora of
Vietnamese Texts (CVT)
CVT Word lists
The CVT is composed of over one million words. The following
are three word lists that summarize the CVT. The first list includes all of the
words in the entire CVT. The second list is comprised of all the words in the
children’s literature corpus. The third list includes all the words in the
newspaper corpus. **Please note that certain tones and vowels have been
formatted to be read by the concordance program during the analysis process.
For a complete list of the formatting changes, see the Font Coding System**
Words are listed in order from most to least frequent. Information on
number of occurrences and percent of occurrence in the entire CVT are included.
Although all steps have been taken to make this information accessible to the
reader, these word lists are rather extensive. It is advisable to print only
the portions or pages that interest you. It is permissable to print and use
the CVT for non-profit research and educational purposes providing the appropriate
citation to this website.
Citation:
Tang, G. (2006). Corpora of Vietnamese Texts. Retrieved
from www.vnspeechtherapy.com
Contents of this section include:
- CVT:
Word frequency list
-
Vietnamese children’s literature corpus: Word frequency list
-
Vietnamese newspaper corpus: Word frequency list
|