TimeTank: A Corpus of Sentences Annotated with TimeInfo for Temporal Data (2023)

Data creators : Salah Yahiaoui [1], Iana Atanassova [1]
[1] : Centre de recherche interdisciplinaires et transculturelles (Université de Franche-Comté)
Description :
Annotating temporal information in texts is a challenging and time-consuming task. It requires an understanding of natural language, as well as knowledge about the various ways in which temporal data can be expressed and structured in a text. However, the ability to access temporal semantics through computer tools is crucial for many applications that involve interpreting and understanding texts.

A corpus available in this field is TimeBank (Pustejovsky et al., 2003), which was annotated using the TIMEX3 annotation scheme (Pustejovsky et al., 2003), a scheme that does not support complex temporal expressions.

We proposed a new annotation scheme for temporal information in scientific texts: TimeInfo (Yahiaoui & Atanassova, 2022) which allows for more precise and directly usable annotations. The corpus presented here, named TimeTank, consists of 1186 sentences containing a total of 1200 temporal expressions annotated according to the TimeInfo annotation scheme.
Disciplines :

General metadata

Data acquisition date : 16 Apr 2020
Data acquisition methods :
  • Reference data :
    The corpus consists of 1186 sentences drawn from 603 scientific articles from the CORD-19 corpus (Wang et al., 2020). The sentences were identified and annotated automatically, and the quality of the annotations was manually verified.
    We analyzed and processed the CORD-19 Open Research Dataset Challenge (CORD-19) using the Python programming language and syntactic rules that we developed.
Update periodicity : no update
Language : English (eng)
Formats : text/csv
Audience : Research, Informal Education
Publications :
  • Yahiaoui, Salah, and Iana Atanassova. "TimeInfo: a Semantic Annotation Framework for Temporal Information in Scientific Papers." Terminology & Ontology: Theories and applications (TOTH 2022). 2022. (hal:04092537)
Additional information :
Bibliography
Pustejovsky, James, et al. "The timebank corpus." Corpus linguistics. Vol. 2003. 2003.
Pustejovsky, James, et al. "TimeML: Robust specification of event and temporal expressions in text." New directions in question answering 3 (2003): 28-34.
Wang, Lucy Lu, et al. "Cord-19: The covid-19 open research dataset." ArXiv (2020).
Yahiaoui, Salah, and Iana Atanassova. "TimeInfo: a Semantic Annotation Framework for Temporal Information in Scientific Papers." Terminology & Ontology: Theories and applications (TOTH 2022). 2022.

DOI and links

10.25666/DATAUBFC-2024-10-04
https://dx.doi.org/doi:10.25666/DATAUBFC-2024-10-04
https://search-data.ubfc.fr/FR-13002091000019-2024-10-04

Quotation

Salah Yahiaoui, Iana Atanassova (2023): TimeTank: A Corpus of Sentences Annotated with TimeInfo for Temporal Data. CRIT. doi:10.25666/DATAUBFC-2024-10-04

Record created 4 Oct 2024 by Hélène Tisserand.
Local identifier: FR-13002091000019-2024-10-04.

dat@uFC

dat@uFC is a sub-portal of dat@UBFC, a metadata catalogue for research data produced at UBFC.

République Française
dat@UBFC
dat@uFC
Université de Bourgogne, Université de Franche-Comté, UTBM, AgroSup Dijon, ENSMM, BSB, Arts des Metiers