TimeTank: A Corpus of Sentences Annotated with TimeInfo for Temporal Data (2023)
Data creators :
Salah Yahiaoui [1],
Iana Atanassova [1]
[1] : Centre de recherche interdisciplinaires et transculturelles (Université de Franche-Comté)
Description :
Annotating temporal information in texts is a challenging and time-consuming task. It requires an understanding of natural language, as well as knowledge about the various ways in which temporal data can be expressed and structured in a text. However, the ability to access temporal semantics through computer tools is crucial for many applications that involve interpreting and understanding texts.
A corpus available in this field is TimeBank (Pustejovsky et al., 2003), which was annotated using the TIMEX3 annotation scheme (Pustejovsky et al., 2003), a scheme that does not support complex temporal expressions.
We proposed a new annotation scheme for temporal information in scientific texts: TimeInfo (Yahiaoui & Atanassova, 2022) which allows for more precise and directly usable annotations. The corpus presented here, named TimeTank, consists of 1186 sentences containing a total of 1200 temporal expressions annotated according to the TimeInfo annotation scheme.
A corpus available in this field is TimeBank (Pustejovsky et al., 2003), which was annotated using the TIMEX3 annotation scheme (Pustejovsky et al., 2003), a scheme that does not support complex temporal expressions.
We proposed a new annotation scheme for temporal information in scientific texts: TimeInfo (Yahiaoui & Atanassova, 2022) which allows for more precise and directly usable annotations. The corpus presented here, named TimeTank, consists of 1186 sentences containing a total of 1200 temporal expressions annotated according to the TimeInfo annotation scheme.
Disciplines :
computer science, artificial intelligence (engineering science), computer science, information systems (engineering science), computer science, interdisciplinary applications (engineering science), language & linguistics (humanities)
General metadata
Data acquisition date :
16 Apr 2020
Data acquisition methods :
- Reference data : The corpus consists of 1186 sentences drawn from 603 scientific articles from the CORD-19 corpus (Wang et al., 2020). The sentences were identified and annotated automatically, and the quality of the annotations was manually verified.
We analyzed and processed the CORD-19 Open Research Dataset Challenge (CORD-19) using the Python programming language and syntactic rules that we developed.
Formats :
Audience :
Research, Informal Education
Publications :
- Yahiaoui, Salah, and Iana Atanassova. "TimeInfo: a Semantic Annotation Framework for Temporal Information in Scientific Papers." Terminology & Ontology: Theories and applications (TOTH 2022). 2022. (hal:04092537)
Additional information :
Pustejovsky, James, et al. "The timebank corpus." Corpus linguistics. Vol. 2003. 2003.
Pustejovsky, James, et al. "TimeML: Robust specification of event and temporal expressions in text." New directions in question answering 3 (2003): 28-34.
Wang, Lucy Lu, et al. "Cord-19: The covid-19 open research dataset." ArXiv (2020).
Yahiaoui, Salah, and Iana Atanassova. "TimeInfo: a Semantic Annotation Framework for Temporal Information in Scientific Papers." Terminology & Ontology: Theories and applications (TOTH 2022). 2022.
Pustejovsky, James, et al. "The timebank corpus." Corpus linguistics. Vol. 2003. 2003.
Pustejovsky, James, et al. "TimeML: Robust specification of event and temporal expressions in text." New directions in question answering 3 (2003): 28-34.
Wang, Lucy Lu, et al. "Cord-19: The covid-19 open research dataset." ArXiv (2020).
Yahiaoui, Salah, and Iana Atanassova. "TimeInfo: a Semantic Annotation Framework for Temporal Information in Scientific Papers." Terminology & Ontology: Theories and applications (TOTH 2022). 2022.
DOI and links
Salah Yahiaoui, Iana Atanassova (2023): TimeTank: A Corpus of Sentences Annotated with TimeInfo for Temporal Data. CRIT. doi:10.25666/DATAUBFC-2024-10-04
Record created 4 Oct 2024 by Hélène Tisserand.
Local identifier: FR-13002091000019-2024-10-04.