xCite Documentation

What is xCite?

xCite creates dump files containing all instances of certain templates on Wikipedia.

Export

Each template is kept exactly as it appears in the wikisource. Special character newlines, etc.. use standard JSON encoding. Templates with redirects are included eg. {{cite book}} and {{citebook}}.

xCite uses the Toolforge grid with concurrent workers allowing for very fast parsing. If a worker fails mid-stream due to grid instability it will restart no data is lost. Pages are retrieved via the API (the most current live database) there is no lag from other dumps or the replication server.

Cycle

Dumps are generated weekly. Faster times are possible.

4 cycles for each language are kept online.

File format

JSON with one object containing one template per line.

Example line from en.wikipedia.org.magazine.json:


{"a":"100,000 Cobbers","c":"{{cite magazine|title= Soldiers Make New Film |url=http://nla.gov.au/nla.obj-722411219 |page=10|date=October 11, 1941|magazine=Pix}}"}

"a" = article name
"c" = citation

The file is semi-sorted: cites from an individual article are grouped in contiguous lines. Article names are unsorted. Duplicate citations in an article result in duplicates lines in the dump.

Access

Web interface: https://tools-static.wmflabs.org/botwikiawk/xcite/xcite.html

Direct access: https://tools-static.wmflabs.org/botwikiawk/xcite/

Log file: https://tools-static.wmflabs.org/botwikiawk/xcite/log.txt

The log.txt can be accessed programmatically as an index to determine what is available. Entries are listed newest to oldest. It is recommended to use log.txt when determining completed dumps because incomplete files might be in the directory during creation.

Credits

GitHub: https://github.com/greencardamom/xcite

Contact: User:GreenC