PGN Problems 1

For some time I have been toying with Harold van der Heijden's magnificent study database (see http://hhdbv.nl/ for the latest release). It comes as a PGN database, and requires a chess database or closely related software, such as CQL programs, to use. If I want to use it with software more closely adapted to chess compositions, there seems to be no alternative than to create the software to do so.

The first problem in that task is getting to grips with PGN, and for that a copy of the PGN Standard is needed.  As with most Internet-distributed material, it should preferably be as close to its latest  publication as possible in order to avoid any later modifications, intentional or not.

Where can I find one? 

The Short story 

The web page http://tim-mann.org/Standard is the best I have found so far.

The Long Story

Starting with the Wikipedia article on PGN I find a link to https://www.chessclub.com/help/PGN-spec. This document is unfortunately incomplete, and breaks off in section 8.2.3.5, after about 800 lines. (Later findings suggest that the full specification has about 2920 lines).

Another Wiki link goes to http://www.saremba.de/chessgml/standards/pgn/pgn-complete.htm. This is a web translation of the original  document,  so it can't be regarded as a primary source. Judging the content will require a copy of the standard to compare it against, … but that's what I'm looking for.

The Usenet newsgroup rec.games.chess mentioned in both these documents doesn't seem to be archived by Google groups.  There are some commercial providers that may have them --  this may be a later track to follow, as it probably requires signing up for their services

The Internet Archive provide several archives that cover Usenet news.  The most promising is the https://archive.org/details/usenet-rec set, which contains rec.games.chess postings from March 1989, including one by Steven J Edwards with a call for discussion about an early version of PGN, and referring to it as part of the SAN kit. (So I may need to start looking for the SAN kit as well.)

The first mention of a formal specification of PGN that I find is from a posting dated 29 Sep 1993, and notes that it should be available 'early next week'. The specification appear to have been emailed to those who showed an interest. In Dec 1993, there is an announcement that the standard  (version 1993.12.19) would be part of the PGN games archives at chess.uoknor.edu. This seems to be the first authoritative source for the standard document itself, apart from the copies that the author sent on request by email. (Quick check of my email archives ... no, I don't seem to have requested one.)

In a posting from 19 Apr 2003,  I also find a reference to http://tim-mann.org/Standard , which, if I may trust the Internet Archive WaybackMachine, goes back to 2001.  (The site belongs to Tim Mann, who wrote Gnu XBoard and WinBoard.) However, this file should probably not be regarded as authoritative.
 
Unfortunately, chess.uoknor.edu is long gone. And it does not appear to be among the FTP sites archived at the Internet Archive. 

The newsgroup also provided a FAQ posting, part 2 of which  (http://www.faqs.org/faqs/games/chess/part2/) identifies the Internet Chess Server as a general repository where, among other things, the PGN Standard can be found.  But just as rec.games.chess seems to have disappeared, so does ics.onenet.net.  The FAQ was last updated in 2002.

A similar FAQ for the newsgroup rec.games.chess.computer points to caissa.onenet.net as repository, but that site has also disappeared in the history of the Internet. (The FAQ does not say when it was last updated, but the current copy appears to have been last posted in November, 1995, the same year as the newsgroup was created.)

The author of the PGN specification, Steven J. Edwards, died in 2016.

Result: one copy found, but probably not from what would be regarded as an authoritative site.

Next try. 

The Chess Programming Wiki provides an alternative PGN wiki page (see https://www.chessprogramming.org/Portable_Game_Notation).  This page links to https://www.thechessdrum.net/PGN_Reference.txt .  This is a pure text file so this is at least a candidate standard document. 

As mentioned, PGN was initially part of  the SAN kit, a toolkit for chess programming.  The original seems to have been distributed as a file 'SAN.tar.Z', but other archiving and compression methods have been observed. Using that file name as search pattern for Google and FTP search engines, eventually produced the search hit ftp://ftp.freechess.org/pub/chess/Unix/SAN.tar.gz , dating from 2012.

While freechess.org still exists, its FTP services appears to have gone. There are no obvious traces from the current freechess.org web site to this file.

Additionally,  the Internet Archive Wayback Machine (https://web.archive.org/) saved a copy of the www.chessclub.com page  (the page with only 813 lines mentioned above) on Dec 2, 2000. This copy has 2918 lines, and so appears to a better copy than the one they currently present.

During searches for the SAN kit itself, a project SANKit was located on Sourceforge.  While the project description makes it clear that the code base is based on the original SAN kit but has been modified, the downloadable zip archive appears to contain the original SAN kit file (…/original_archive/sankit.tar.gz) in which a file 'Standard', dated 1994-02-22, containing a PGN specification with a revision date of 1994-02-21.

Compared with the original Wikipedia links, which both have 1994-03-12 as revision date, this is not the latest release, but it is very likely something against which other candidates can be compared and evaluated.

Additional searches in personal backups and various old-but-not-yet-discarded hard drives came up with a PGN_Standard.txt file from 2005, as well as one file from the distribution of chest 3.19 (a chess-problem solving program by Heiner Marxen, which also appears to have been lost to sight).

The search result is now, leaving out the SAN Kit file :

  1. chessdrum file from ??? but probably later than original
  2. chest file from 1998
  3. personal file from 2005
  4. chessclub file from 2000
  5. tim mann file from 2001

The personal file was quickly eliminated, as it appeared to have inserted CR/LF in the middle of some lines, sometime in the middle of a word. These breaks were very regularly placed throughout the file: the appear to have been inserted by some automatic process; possibly the result of bad compression or, more probably, bad decompression  software. 

The chest file did not have these any such insertions. It matched the personal files reasonably well, and is also the oldest of the located files.

The chessdrum file differed from the chest file in having 187 lines of table of context inserted at the beginning of the file.  It also appears to have an additional character inserted in section 0. Preface, line 199:
>From the Tower of Babel story:
The chest file does not have the initial '>' character.

The archived chessclub file was almost identical with the chest file. The only differences found was that the chest file had  two additional empty lines at the beginning. The chessclub file lacked the line 'Standard: EOF' at the very end.

The tim_mann file was the same as the chest file, except that it lacked the two empty lines that the chest file had at the beginning of the file. It also differed regarding line separators.  Where the chest file has CR/LF, the timm_mann file has LF only.  

The earlier SANKit file was also examined, mainly to get an idea of what the original may have looked like.  As to the technical content, it shows some differences but many of the changes appear to be editorial.

The SANKit file uses LF for line separators. It lacks the two empty initial lines found  in the chest file.  It ends with 'Standard: EOF'.

This seems to point at the Tim Mann file as probably the best copy found with a day's work.  It is may not be the original, as it is dated 2001, about seven years after the original was published, so there might still be a even better copy somewhere out there to be found.

(File comparisons was made by Notepad++ 7.8.5 and the Compare plugin 2.0.0 with default settings.)

No comments:

Post a comment