School of Computing



Dr Bogdan Babych

Associate Professor in Translation Studies

Member of the Natural Language Processing Group

Funded research projects

EU FP7 Marie Curie IAPP Project: HyghTra (2010-2014)
Project: Hybrid high-quality translation system

Value: EUR 833,193 (total), including EUR 571,811 for the University of Leeds

Recruitment: 4 Marie Curie Research Fellows

A collaborative European Marie Curie IAPP (Industry-Academia Partnership and Pathways) project HyghTra has involved our academic team from the Centre for Translation Studies at Leeds and an industrial team at the German company Lingenio GmbH. The project's objective has been to create a technology for rapid development of high-quality hybrid MT systems and richly annotated linguistic resources for Lingenio’s rule-based MT using state-of-the-art statistical methods of natural language processing. Four members of CTS staff (Bogdan Babych, Serge Sharoff, Anthony Hartley and Martin Thomas) and two research fellows (Mireia Ginesti Rosell and Reinhard Rapp) have worked on HyghTra.

Leeds team has coordinated the project.

Project website.

EU FP7 ICT Project: ACCURAT (2010-2012)
Project: Analysis and evaluation of Comparable Corpora for Under Resourced Areas of machine Translation

Value: EUR 341,110 (for Leeds)

Recruitment: 2 Research Fellows

In a collaborative European FP7 ICT project ACCURAT, the University of Leeds / Centre of Translation Studies team have been leading research and development activities for one of the project's work packages (Comparability Metric) and for several deliverables in work packages lead by other partners. The programme of collaborative research aimed at identifying comparability features in documents and corpora that can be automatically calculated, to be used for creating MT systems from under-resourced languages using comparable corpora. Three members of staff (Bogdan Babych, Serge Sharoff and Anthony Hartley) and two research fellows (Fangzhong Su and Richard Forsyth) at Leeds have worked on the project. Leeds team also have coordinated a large-scale MT evaluation experiment that has involved 40 evaluators across several European countries.

The project has received a positive final review from the Commission and achieved all its objectives, including the deliverables assigned to our CTS/Leeds team. The developed software has been included into a package of open-source software tools to be used by industrial and academic teams for developing of MT for under-resourced languages.

Project website.

Leverhulme Early Career Research Fellowship (2007-2009)
Project: Translation Strategies in Comparable Corpora

Value: GBP 40,000

A fundamental problem in translation is to choose that equivalent from a set of potential equivalents which best fits the target language context. The proposed research will borrow from language modelling techniques used in Information Extraction to automatically identify and rank translation equivalents according to their contextual relevance. Focusing on English and Russian, it will use parallel, translated data to identify classes of the non-literal translation strategies adopted by translators, which will be modelled in a computer program. The model's ability to identify good translation solutions to novel translation cases will be tested on very large comparable corpora -- texts in the same domain which are not translations of one another. Evaluation will be conducted with professional translators. The work's theoretical significance lies in providing an important link between language, knowledge and the subject domain, which is essential for the process of translation.

The main objective of the proposed research is to develop and evaluate a predictive model of indirect translation procedures which can offer practical methods for applying appropriate translation shifts and find potentially useful solutions. This will be done by investigating general mechanisms behind individual shifts and testing deductive hypotheses about these mechanisms on real translation problems. To permit rigorous testing, the model will be implemented as a computer program and derived from very large corpora of English and Russian texts. This will constitute a dynamic resource for translators, which will apply the discovered and clearly understood translation strategies to novel expressions and contexts on-the-fly and generate potentially useful non-literal suggestions.

More details are on the project webpage.

UK AHRC DEDEFI project: IntelliText (2010-2012) -- Main author of the reseach proposal
Project: (Intelligent Tools for Creating and Analysing Electronic Text Corpora for Humanities Research)

Project proposal [pdf]; Technical Annex [pdf]; Impact Plan [pdf]

Value: GBP 159,293

Recruitment: 2 Research Fellows

Project website.

Humanities researchers' lack of awareness of modern computational techniques for corpus-based studies can seriously limit the scope and the impact of any planned research projects. Moreover, computer scientists who design corpus-based tools frequently do not understand the specific needs of humanities research; their tools are often difficult to adapt to a specific project, or lack an intuitive interface and documentation. As a result, important potential synergies for the research of both parties have been neglected, in particular the applicability of several non-trivial computational techniques to preparing and analysing corpus data, with the power to reveal new dependencies and patterns in the material, and thus yield a much greater impact.

IntelliText aims to create software to allow humanities researchers with no specialised background in computer science or corpus linguistics to take advantage of advanced methods of text collection and analysis. It enables them to collect new project corpora from the web, have them enriched automatically with linguistic and other annotations, and then easily uncover interesting patterns of usage, starting either from their own intuitions and hypotheses, or from expressions and patterns identified as potentially noteworthy by the system.

The software is designed and tested in conjunction with target applications in three areas: translation studies, language teaching and monitoring opinion and sentiment. These demonstrate its generalisability for addressing the needs of a wide spectrum of humanities researchers, including historians and literary specialists. IntelliText offers an intuitive and well-documented interface and be made freely available for research purposes.

CTS CorpusLabs project: interactive labs for corpus research

Web service tools for supporting industrial collaboration, impact and students' research projects: