Filosofian tohtori Jauhiainen Tommi and working group

290400 €

Automatic Classification and Analysis of Texts from Egyptian Antiquity

Tieteellinen tutkimus / siihen pohjautuva työ | Nelivuotinen

The project will develop new state-of-the-art language technological methods for processing textual documents from Egypt dating from the 8th century BCE to the 7th century CE. It investigates the extensive textual evidence from the region as a whole, including the texts in both the Greek and the Egyptian languages. For many texts, it is a problem for their use in humanities research that the dates are given in ranges spanning several hundred years or left out altogether due to uncertain provenance. However, for some texts, we know the exact date when they were written as they include dates. For others, we know that they have been written after or before a specific date as the contents refer to other things. Texts can also be dated using analysis of handwriting or by archeological means. Similar means can be used to determine the place of origin for the texts. In this project, we seek to do the same using only their textual contents. We will further improve the state-of-the-art text classification methods we have developed for other domains. Combined with the previously used means for attributing the date and the place, we can pinpoint the texts' time and place of origin on an entirely new level of accuracy. In addition to developing methods for detecting the origin and date of the texts, we will investigate the possibilities of automatically detecting loan words between Greek and Egyptian. The results will be investigated manually, and more traditional philological research will be conducted with the project collaborators to shed light on any interesting phenomena found. A large part of the project is dedicated to collaboration between the project and various entities that own the copyright to the existing machine-readable texts. With our existing networks and extensive experience publishing open datasets, we can help bring forth new collections of machine-readable texts to researchers worldwide.