Dataset : Wikipedia network analysis of cancer interactions and world influence


General metadata

Identifiers :
local : FR-18008901306731-2019-01-25 external : doi:10.25666/DATAOSU-2019-01-25
Description :
We apply the Google matrix algorithms for analysis of interactions and influence of 37 cancer types, 203 cancer drugs and 195 world countries using the network of 5 416 537 English Wikipedia articles with all their directed hyperlinks. The PageRank algorithm provides the importance order of cancers which has 60% and 70% overlaps with the top 10 cancers extracted from World Health Organization GLOBOCAN 2018 and Global Burden of Diseases Study 2017, respectively. The recently developed reduced Google matrix algorithm gives networks of interactions between cancers, drugs and countries taking into account all direct and indirect links between these selected 435 entities. These reduced networks allow to obtain sensitivity of countries to specific cancers and drugs. The strongest links between cancers and drugs are in good agreement with the approved medical prescriptions of specific drugs to specific cancers. We argue that this analysis of knowledge accumulated in Wikipedia provides useful complementary global information about interdependencies between cancers, drugs and world countries.
Disciplines :
Keywords :

Dates :
Data acquisition : from 1 May 2017 to 31 May 2017
Data provision : 23 Jan 2019
Metadata record : Creation : 25 Jan 2019 Update : 20 Sep 2019

Language : English (eng)
Audience : General, Research, Stakeholder, Policy maker, Informal Education
RightsAttribution, Non Commercial, Share Alike


José Lages, Dima Shepelyansky, Guillaume Rollin (2019): Wikipedia network analysis of cancer interactions and world influence. UTINAM. doi:10.25666/DATAOSU-2019-01-25


Spatial coverage :

  • Monde: latitude between 85° N and 85° S, longitude between 180° W and 180° E

Time coverage :

Taxonomic coverage :

  • Species
    Homo sapiens MSW (Human)

Administrative metadata

Data creators : José Lages [1] [2], Dima Shepelyansky [3], Guillaume Rollin [1] [2]
[1] : Institut UTINAM (UMR 6213) (Université de Franche-Comté)
[2] : Observatoire des Sciences de l'Univers - Terre, Homme, Environnement, Temps, Astronomie (UAR 3245) (Université de Franche-Comté)
[3] : Laboratoire de Physique Théorique (UMR 5152)
Publisher : Institut UTINAM (UMR 6213)
Label : Initiative pour le SITE Bourgogne Franche-Comté
Science contact : José Lages website e-mail
Computing contact : José Lages website e-mail
Projects and funders :
Access : available

Technical metadata

Formats : application/pdf, image/png, image/svg+xml, image/x-eps, text/csv, text/html, text/plain
Data acquisition methods :
  • Derived or compiled data :
    Web crawling of Wikipedia editions (May 2017) to retrieve information.
  • Simulation or computational data :
    PageRank, CheiRank and 2DRank algorithms have been used to rank articles of the English Wikipedia language edition (May 2017).
    Reduced Google matrix method has been used to infer interactions between articles.
Datatype : Dataset



dat@UBFC is a metadata catalogue for research data produced at UBFC.

Université de Bourgogne, Université de Franche-Comté, UTBM, AgroSup Dijon, ENSMM, BSB, Arts des Metiers