Welcome to CQPweb at Beijing Foreign Studies University.
This site is maintained by Jiajin Xu, Liangping Wu and Mingchen Sun.

Please select a corpus below to enter.
(Both user ID and password are "test" for freely available corpora.)
GLOBE (Global Languages Out of BFSU Expertise) family corpora (Principal investigator: Jiajin Xu)
 
The arGLOBE Corpus (a balanced corpus of 1M-word contemporary written Arabic, lemmatised and PoS-tagged) created by Junyu Mao at the School of Arabic Studies, BFSU
 
 
The bnGLOBE Corpus (a balanced corpus of 1M-word contemporary written Bengali, PoS-tagged) created by Shangyao Yuan, Yuanyu Li, Jiaoyang Wang, Mengting Yu and Qianhui Xu at the School of Asian Studies, BFSU
 
 
The caGLOBE Corpus (a balanced corpus of 1M-word contemporary written Catalan, lemmatised and PoS-tagged) created by Wei Sun at the School of Hispanic and Portuguese Studies, BFSU
 
 
The daGLOBE Corpus (a balanced corpus of 1M-word contemporary written Danish, lemmatised and PoS-tagged) created by Yuchen Wang and Jiayuan Zhou at the School of European Languages and Cultures, BFSU
 
 
The deGLOBE Corpus (a balanced corpus of 1M-word contemporary written German, lemmatised and PoS-tagged) created by Guying Zhou and Yingming Song at NRCFLE, BFSU and Zhe Shu, Yu Sun and Liang Xu at the School of German Studies, BFSU
 
 
The faGLOBE Corpus (a balanced corpus of 1M-word contemporary written Farsi/Persian, raw) created by Yanjun Li, Shuainan Chen, Qi Hu & Tinglu Zhou at the School of Asian Studies, BFSU
 
 
The faGLOBE Corpus (a balanced corpus of 1M-word contemporary written Farsi/Persian, lemmatised and PoS-tagged) created by Yanjun Li, Shuainan Chen, Qi Hu & Tinglu Zhou at the School of Asian Studies, BFSU
 
 
The fiGLOBE Corpus V1.0 (a balanced corpus of 1M-word written Finnish, lemmatised and PoS-tagged) created by Ying Li, Xinyu Feng, Yuhang Li, Yi Li and Yixin Zhang at the School of European Languages and Cultures, BFSU
 
 
The frGLOBE Corpus (a balanced corpus of 1M-word contemporary written French, lemmatised and PoS-tagged) created by Likai Yin at NRCFLE, BFSU
 
 
The huGLOBE Corpus (a balanced corpus of 1M-word contemporary written Hungarian, lemmatised and PoS-tagged) created by Qiuping Wang (BFSU), Shuangxi Duan (BISU) and Enyue Wang (BISU & The University of Szeged)
 
 
The itGLOBE Corpus (a balanced corpus of 1M-word contemporary written Italian, PoS-tagged) created by Dan Dong, Ruchen Yu and Yaoyi Guo at the School of European Languages and Cultures, BFSU and Jiawen Guo at BISU
 
 
The kmGLOBE Corpus (a balanced corpus of 1M-word contemporary written Khmer/Cambodian, PoS-tagged) created by Xuanzhi Li & Kexuan Yang at the School of Asian Studies, BFSU
 
 
The loGLOBE Corpus (a balanced corpus of 1M-word contemporary written Lao, PoS-tagged) created by Huiling Lu, Yufeng Cao, Cheng Ouyang, Zhenhua Yang, Jingru Wu, Xiaoling Wu, and Yaoqing Wang at the School of Asian Studies, BFSU
 
 
The ltGLOBE Corpus (a balanced corpus of 1M-word contemporary written Lithuanian, lemmatised and PoS-tagged) created by Yiran Wang, Yitong Zhang, Shuning Zhao, and Yutong Guan at the School of European Languages and Cultures, BFSU
 
 
The nlGLOBE (a balanced corpus of 1M-word contemporary written Dutch, lemmatised and PoS-tagged) created by Jiachen Zhang, Xiaoxiao Lin, Xiaoou Lei, Zhiyan Zheng and Yunjie Zhang at the School of European Languages and Cultures, BFSU
 
 
The sqGLOBE Corpus (a balanced corpus of 1M-word contemporary written Albanian, lemmatised and PoS-tagged) created by Jing Ke, Qiao Jin, Shihao You, Tong Han, Yue Feng, Xinyu Wang, Ziqi Hu, Edmond Laçi, Eranda Allmetaj, Tianxing Chen, Meilin Mu, Yiman Zhou, Siyuan Zhou, Jinghan Zhao, Zhuojun Zhang, Weizhen Zhang, Haoyang Zhang, Yanjia Lu, and Weihao Ai at the School of European Languages and Cultures, BFSU
 
 
The thGLOBE Corpus (a balanced corpus of 1M-word contemporary written Thai, lemmatised and PoS-tagged) created by Shang Yingying, Wang Liyuan, Qu Yingtong, Li Yingjie, Chen Zhenyu, Ju Xinshu, Wang Xibeier, Jin Mengzhe, Zhao Xiaopei, Zhou Yibo, Wu Chenyang, Li Zixi, Pan Zilong, Zhang Jialing, and Wang Nenghao at the School of Asian Studies, BFSU
 
 
The urGLOBE Corpus (a balanced corpus of 1M-word contemporary written Urdu, lemmatised and PoS-tagged) created by Yuan Yuhang, Yang Yue, Guo Xinyu and Shang Yule at the School of Asian Studies, BFSU
 
Parallel corpora
 
The Babel English-Chinese Parallel Corpus created by Richard Xiao (en->zh)
 
 
The Babel English-Chinese Parallel Corpus created by Richard Xiao (zh->en)
 
 
CECPC_Core (CN to EN subcorpus. PI: Kefei Wang, BFSU) Pls type in Chinese words to retrieve ZH-EN parallel concordances.
 
 
CECPC_Core (CN to EN subcorpus. PI: Kefei Wang, BFSU). Pls type in English words to retrieve ZH-EN parallel concordances.
 
 
CECPC_Core (EN to CN subcorpus. PI: Kefei Wang, BFSU). Pls type in Chinese words to retrieve parallel concordances.
 
 
CECPC_Core (EN to CN subcorpus. PI: Kefei Wang, BFSU). Pls type in English words to retrieve parallel concordances.
 
 
Chinese-Korean Parallel Corpus of Political Documents (ko->zh) created by Xiaofeng Qi, BFSU
 
 
Chinese-Korean Parallel Corpus of Political Documents (zh->ko) created by Xiaofeng Qi, BFSU
 
 
TED English Chinese Parallel Corpus of Speeches created by Jiajin Xu (zh->en)
 
 
TED English Chinese Parallel Corpus of Speeches created by Jiajin Xu (en->zh)
 
 
Yiyan English-Chinese Parallel Corpus created by Xiuling Xu & Jiajin Xu (en->zh)
 
 
Yiyan English-Chinese Parallel Corpus created by Xiuling Xu & Jiajin Xu (zh->en)
 
Learner English corpora
 
Chinese Learner English Corpus (CLEC, 1M words, Co-PIs: Shichun Gui & Huizhong Yang)
 
 
iWriteBaby Chinese Learner English Corpus (8M words, PoS-tagged, created by Jiajin Xu, BFSU)
 
 
iWriteBaby (for alpha test only)
 
 
The TECCL corpus V1.1 (Ten-thousand English Compositions of Chinese Learners, 1.8M words, created by Jiajin Xu, BFSU)
 
 
The aiTECCL corpus (The corpus, which contains 2M words generated by the GPT-3.5 model using identical writing prompts to those employed in the TECCL Corpus, aims to serve as a reference that is close to the linguistic quality of L1 English speakers. The corpus is made available online on 9 Aug, 2023.), created by Jiajin Xu and Mingchen Sun, BFSU
 
 
WECCL 2 (Written part of SWECCL 2, PI: Qiufang Wen)
 
English corpora
 
Brown corpus (AmE 1961)
 
 
AmE Brown Family Corpora (Brown1961, Frown1992, CROWN2009, CROWN2021, 4M words)
 
 
Business English Corpus (2M words, created by Lifei Wang, UIBE)
 
 
Contemporary College English (textbook corpus, for in-house use only)
 
 
China Daily Political News 2011
 
 
China English News Corpus (English news published in China from 2019 to 2021, collected by Chentingyan Zhang, BFSU)
 
 
CLOB2009 corpus (Brown family, 1M words, BrE 2009, created by Jiajin Xu et al, BFSU)
 
 
COLEN (textbook corpus)
 
 
CROWN2009 corpus (Brown family, 1M words, AmE 2009, created by Jiajin Xu et al, BFSU)
 
 
CROWN2021 (Brown family, 1M words, AmE 2021, created by Mingchen Sun, Jiajin Xu et al, BFSU)
 
 
Novels by Charles Dickens
 
 
Durban Climate Talks Corpus (China Daily & New York Times)
 
 
Friends (Sitcom transcripts)
 
 
The Independent Corpus (2009-2015, ca. 231 million words)
 
 
MedAca (Medical English discourse of Academia) Corpus, 1M words, created by Xin Feng et al, FJMU
 
 
NESSIE Corpus 1st release (NESSIEv1, Native English Speakers Similarly or Identically-prompted Essays, created by Jiajin Xu, BFSU)
 
 
NESSIE Corpus 2nd release (NESSIEv2, Native English Speakers Similarly or Identically-prompted Essays, created by Jiajin Xu, BFSU)
 
 
PATTIE corpus (Preschoolers- and Teenagers-oriented Texts in English, created by Jie Ji, CFAU)
 
 
TED Speeches (En)
 
 
TIME Magazine Corpus (1923-2008,ca. 196 million words)
 
 
Chinese corpora
 
The BFSU DiSCUSS Corpus (Diversified Spoken Chinese Uttered in Social Settings), 1M-word balanced spoken Chinese corpus, created by Jiajin Xu, et al.
 
 
ICC-CN (The International Comparable Corpus, Chinese component, 60% spoken and 40% written, 1M words. PI: Jiajin Xu)
 
 
Lancaster Corpus of Mandarin Chinese version 1 (LCMCv1, Brown family, 1991, created by Richard Xiao)
 
 
Lancaster Corpus of Mandarin Chinese version 2 (LCMCv2, containing the same text samples as LCMCv1, but with a few typo and segmentation corrections), created by Richard Xiao
 
 
Works of Mo Yan (Chinese Nobel Laureate for Literature) (for in-house use only)
 
 
ToRCH2009 pre-release (Texts of Recent Chinese, Brown family, 2009, 2013 summer edition, created by Jiajin Xu, BFSU)
 
 
ToRCH2009 (Texts of Recent Chinese, Brown family, official release, created by Jiajin Xu et al., BFSU)
 
 
ToRCH2014 (Texts of Recent Chinese, Brown family, created by Jiajin Xu et al., BFSU)
 
 
ToRCH2019 (Texts of Recent Chinese, Brown family, created by Jialei Li, Mingchen Sun, and Jiajin Xu, BFSU)
 
 
BFSU ToRCH family Chinese corpora (ToRCH 2009, 2014 and 2019 combined, 3M tokenised words/5M characters)
 
 
The UCLA Corpus of Written Chinese (2nd edition), created by Hongyin Tao, UCLA
 
 
Corpora of Pacific languages
 
biBrown Press (Bislama news corpus, raw) created by Danyang Zhu, School of English and International Studies, BFSU
 
 
fjBrown Press (Fijian news corpus, raw) created by Shuo Luan, School of English and International Studies, BFSU
 
 
niuBrown (Niuean corpus of religion, government report and academic writing; raw) created by Xuekun Guo, School of English and International Studies, BFSU
 
 
rarBrown Press (Rarotongan/Cook Islands Maori news corpus, raw) created by Lin Fu and Baoxiang Wang, School of English and International Studies, BFSU, and Runheng Zhang, Tsinghua University
 
 
tpiBrown Press (Tok Pisin news corpus, raw. Tok Pisin is one of the official languages of Papua New Guinea.) created by Shuyi Qiu, School of English and International Studies, BFSU
 
 
Corpora of European Languages
 
Chinese Learners Icelandic Corpus (CLIC2012, created by Shuhui Wang, BFSU)
 
 
gaBrown Press (Irish/Gaeilge news corpus, tagged) created by Junhan Zhang, Guiyu Lin, Zhaoyan Chen and Zixin Huang, School of English and International Studies, BFSU
 
 
Griechische Nachrichten Korpus
 
 
Grimm Maerchen (Grimms Fairy Tales)
 
 
Icelandic Parsed Historical Corpus (IcePaHC, PoS tagged version)
 
 
Icelandic Theses by Native Icelandic Speakers (NativeICE)
 
 
Spanish News Corpus, created by Yuanqi Liu, BFSU
 
 
Spanish Novel Corpus, created by Yuanqi Liu, BFSU
 
 
Spoken Spanish Corpus
 
 
Strafgesetzbuch (The German Penal Code)
 
 
German version of Twilight by Stephenie Meyer (for in-house use only)
 
 
Spanish Novels by Award-winning Writers (CNEPH v1.1, created by Yuanqi Liu, BFSU)
 
Multilingual Brown family corpora: The press genre (Principal investigator: Jiajin Xu)
 
amBrown Press (Amharic news corpus, raw)
 
 
arBrown Press (Arabic news corpus, raw)
 
 
bnBrown Press (Bengali news corpus, raw)
 
 
caBrown Press (Catalan news corpus, tagged)
 
 
daBrown Press (Danish news corpus, tagged)
 
 
deBrown Press (German news corpus, tagged)
 
 
dvBrown Press (Dhivehi/Divehi/Mahl/Maldivian news corpus, raw)
 
 
faBrown Press (Farsi/Persian news corpus, raw)
 
 
fiBrown Press (Finnish news corpus, tagged)
 
 
frBrown Press (French news corpus, tagged)
 
 
haBrown Press (Hausa news corpus, raw)
 
 
hiBrown Press (Hindi news corpus, raw)
 
 
hrBrown Press (Croatian news corpus, raw)
 
 
huBrown Press (Hungarian news corpus, tagged)
 
 
hyBrown Press (Armenian news corpus, raw)
 
 
isBrown Press (Icelandic news corpus, tagged)
 
 
itBrown Press (Italian news corpus, tagged)
 
 
jpBrown Press (Japanese news corpus, tagged)
 
 
kmBrown Press (Khmer news corpus, raw)
 
 
koBrown Press (Korean news corpus, tagged)
 
 
loBrown Press (Lao news corpus, raw)
 
 
ltBrown Press (Lithuanian news corpus, raw)
 
 
mgBrown Press (Malagasy news corpus, raw)
 
 
msBrown Press (Malay news corpus, raw)
 
 
mtBrown Press (Maltese news corpus, tagged)
 
 
neBrown Press (Nepali news corpus, raw)
 
 
nlBrown Press (Dutch news corpus, tagged)
 
 
prsBrown Press (Dari news corpus, raw). Dari, along with Pashto, is one of the two official languages of Afghanistan. Dari is the Afghan dialect of Farsi/Persian.
 
 
siBrown Press (Sinhalese/Singhalese/Cingalese/Sinhala news corpus, raw)
 
 
sqBrown Press (Albanian/Shqip/Shqipe news corpus, raw)
 
 
srBrown Press (Serbian/Srpski news corpus, raw)
 
 
swBrown Press (Swahili/kiSwahili news corpus, tagged)
 
 
thBrown Press (Thai news corpus, raw)
 
 
urBrown Press (Urdu news corpus, raw)
 
 
zuBrown Press (Zulu/isiZulu news corpus, raw)
 
 
DEAP (Database of English for Academic Purposes) family corpora (Principal investigator: Jiajin Xu)
 
AgriDEAP (5M words of agriculture English research articles, created by Jing Lǚ, SCAU)
 
 
ArtDEAP (6M words of art research articles, created by Xichun Han, XARTVU)
 
 
BasicMedDEAP (8M words of basic medical sciences English research articles, created by Xi Luo, Lei Zhang, Xiaoqing Zhan, Xuejiao Tan, Xingmei Gu, Jiecong Li, Jia Wang, Yujun Xian, Xiewan Chen, AMU)
 
 
BioDEAP (5M words of life science English research articles, created by Gong Peng, UCAS)
 
 
ChemDEAP (5M words of chemistry English research articles, created by Lanfeng Zhong , UJS)
 
 
CivDEAP (5M words of civil engineering English research articles, created by Baicheng Zhang, CQJTU)
 
 
DEAP Baby Corpus V1.0 (1.25M words, a 25-discipline balanced English research article corpus, created by Mingchen Sun & Jiajin Xu)
 
 
EconDEAP (6M words of economics English research articles, created by Xia Liu, SWUFE)
 
 
EduDEAP (5M words of education English research articles, created by Li Wang, SHNU)
 
 
EvmtDEAP (6M words of environmental engineering English research articles, created by Zhi Li, BJFU)
 
 
GeoDEAP (6M words of geography English research articles, created by Lei Liu, YSU)
 
 
InfoDEAP (5M words of information science English research articles, created by Yaochen Deng, DLUFL)
 
 
LawDEAP (5M words of law English research articles, created by Yanwei Wang, SDJU)
 
 
LinDEAP (5M words of linguistics English research articles, created by Zhanting Bu, QDU)
 
 
LitDEAP (5M words of literary studies English research articles, created by Tao Yu, JSNU)
 
 
MatDEAP (5M words of materials science English research articles, created by Pengfei Yan, BIT)
 
 
MathDEAP (6M words of mathematics English research articles, created by Xiaoli Zhu, ZUST)
 
 
MedDEAP (5M words of clinical medical sciences English research articles, created by Xin Feng et al., FJMU)
 
 
MgmtDEAP (7M words of management research articles, created by Jingzi Deng)
 
 
MilDEAP (5M words of junshi kexue English research articles, created by Xiaolei Ma)
 
 
MineDEAP (5M words of mining engineering English research articles, created by Ruiying Zhang, CUMTB)
 
 
NewsDEAP (5M words of media and communication English research articles, created by Guiling Niu, ZZU)
 
 
PhilDEAP (5M words of philosophy English research articles, created by Zhanting Bu, QDU)
 
 
PhysDEAP (5M words of physics English research articles, created by Hong Liao, PZHU)
 
 
PoliDEAP (5M words of political science English research articles, created by Guobing Liu, HNU)
 
 
PsyDEAP (6M words of psychology English research articles, created by Jiehui Hu, UESTC)
 
 
ShipDEAP (5M words of naval architecture and ocean engineering English research articles, created by Miao Tian, HRBEU)
 
 
SociDEAP (5M words of sociology research articles, created by Li Wang, SHNU)
 
 
StatDEAP (5M words of statistics English research articles, created by Le Zhang, USST)
 
 
Babel news corpus family (Multilingual news on the web)
 
AlbanianWaC (Albanian news on the web, 16M words, WaC: Web as Corpus)
 
 
PoS-tagged AlbanianWaC corpus (Albanian news on the web, 10M words, lemmatised and PoS-tagged)
 
 
IndUrWaC corpus (Indian Urdu news on the web, 10M words, lemmatised and PoS-tagged)
 
 
ItalianWaC corpus (Italian news on the web, 10M words, PoS-tagged)
 
  
COPE Family (Corpus of Occupational Purpose English)
 
AutoCOPE (Corpus of English for automotive industry, 300K words, lemmatised and PoS-tagged) created by Lin Xu, Dalian Maritime University
 
 
ChemCOPE (Corpus of English for chemical industry, 300K words, lemmatised and PoS-tagged) created by Juanyin Liu of Hebei Vocational University of Industry and Technology & Mingchen Sun of BFSU
 
 
CivCOPE (Corpus of English for civil engineering, 300K words, lemmatised and PoS-tagged) created by Ying Lin, Guangxi Polytechnic of Construction
 
 
FoodCOPE (Corpus of English for food industry, 300K words, lemmatised and PoS-tagged) created by Zhen Wang, Qingdao Agricultural University
 
 
HotelCOPE (Corpus of English for hotel staff, 300K words, lemmatised and PoS-tagged) created by Jiajin Xu & Mingchen Sun, BFSU
 
 
HydraCOPE (Corpus of English for hydraulic industry, 300K words, lemmatised and PoS-tagged) created by Xiuling Xu, BFSU
 
 
NurseCOPE (Corpus of English for nursing, 300K words, lemmatised and PoS-tagged) created by Xin Feng et al., Fujian Medical University
 
 
SportCOPE (Corpus of English for sports industry, 300K words, lemmatised and PoS-tagged) created by Jiajin Xu, BFSU
 
 
PubseCOPE (Corpus of policing English, 300K words, lemmatised and PoS-tagged) created by Fanjing Zeng, PPSUC
 
Endangered Languages
 
The Dungan-Chinese Corpus (The Dungan-Chinese gloss aligned Corpus)
 
 
The Dungan Corpus (Dungan is spoken primarily in Kyrgyzstan, with speakers in Kazakhstan, Uzbekistan, and Russia as well. The Dungan ethnic group are the descendants of refugees from China who migrated west into Central Asia in the Qing Dynasty.)
 
 
Chinese-based parallel search interface for the same Dungan-Chinese Parallel Corpus
 
Translated languages
 
The CCTFC corpus (The Contemporary Chinese Translated Fiction Corpus, created by Xianyao Hu, SWU)
 
 
The COTE corpus (COTE Corpus of Translational English, created by Richard Xiao)
 
 
Hong Lou Meng/Dream of the Red Chamber (English translation by Xianyi Yang and Gladys Yang)
 
 
Hong Lou Meng/Dream of the Red Chamber (Russian Translation)
 
 
Sunzi Kunst des Krieges (German translation of Sunzi Bingfa/The Art of War by the ancient Chinese military strategist Sunzi, aka Sun-Tzu)
 
 
ZCTC corpus (ZJU Corpus of Translational Chinese, created by Richard Xiao)
 
Corpora of Other Asian Languages
 
United Nations Corpus (Arabic)
 
 
Welcome Speech by President of Tokyo University (test data)
 
 
System messages
2022-07-26 A short manual for CQPweb (in Chinese)
http://corpus.bfsu.edu.cn/CQPweb_tutorial_here.pdf
Queries can be addressed to xujiajin@bfsu.edu.cn.
2020-10-05 BFSU CQPweb
The BFSU CQPweb was maintained by Prof. Jiajin Xu and Dr.
Liangping Wu of the National Research Centre for Foreign Language
Education and the National Research Centre for State Language
Capacity, Beijing Foreign Studies University, China.
2014-01-26 A new metric (Effect Size or %DIFF) of keyword analysis
Recently we have implemented a new complementary metric (Effect Size or %DIFF) to log likelihood ratio (LL) of keyword computation proposed by Dr. Costas Gabrielatos and Anna Marchi. Please refer to http://repository.edgehill.ac.uk/4100/ and http://repository.edgehill.ac.uk/4196/ for explanations about Effect Size of keyword analysis.
2012-10-06 Disclaimer
The corpora mounted at our site are for academic purposes only. Please let us know, if any of the texts contained in our corpora might cause any potential infringement of your copyright. We will remove the portion of text(s) asap.

CQPweb v3.0.7 © 2008-2012 [Admin logon] You are not logged in