Skip to main content

Corrections about Chess Explanations


As soon as I wrote my previous essay, I realized there are several points that require correction or further explication. 
First, I checked if Stockfish would use against me the same strategy that I described for KRK and KQK, and saw that it does not, which proves I was wrong when I supposed that there are no correct strategies which are substantially different from those I described. It turns out there are, and they are faster in delivering checkmate, especially when starting position is with BK in the center and WK in the corner. The basic difference is that I assumed BK must be squeezed from the center to some edge by pushing it away from WK, by taking immediately under control the line that separates kings, using the major piece, while the engine is not afraid to push BK towards WK, and save that way a few moves needed for WK to approach BK.
Second, precise and concise language in which rigorous proofs are written is known to be math. Describing KRK endgame concept mathematically speaking means proving the theorem that KRK endgame is a win for white regardless of starting position, with the trivial drawing exception in case when black moves first and is able to capture the rook, or is stalemated in the corner.
That requires describing at least one winning strategy for white, and proving its correctness, either by computer aided method or by pen and paper non-computer-aided method.
That leads to a decomposition of main theorem into a set of lemmas, each for a certain stage of strategic plan, that each has to be proved in order to prove the main theorem.
However this form of describing certain concept as an alternating sequence of theorems and proofs is applicable to endgames only (some of them at least), but chess concept is a broader term that includes not only endgames, but categories that cannot be described that way, otherwise chess would be mathematically solved.
Hence the question of utility of fictional CCDL still stands.
Also, there is a real language, the Chess Query Language or CQL,         http://www.gadycosteff.com/cql/index.html  coauthored by Gady Costeff and Lewis Stiller, which covers a great deal of needs for chess explanation. It searches for user-defined patterns in chess games by defining filters, a feature not uncommon in query languages, which applied to an input .pgn file outputs a new .pgn file that contains only games from the input file in which a pattern is recognized.
A filter is also defined inside a file, .cql, and it contains a boolean function that returns true, if certain condition that user finds interesting is met within a game, ie if pattern is matched. That function can take a single position as an argument, in case of position patterns, or multiple positions,  in case of movement patterns. That is the most basic categorization of CQL patterns one can think of, since each game from the input .pgn is replayed, ie transcribed into a fen list, or whatever other position representation is used, and positions are examined independently or dependently, according to pattern requirement. So any kind of pin is an example of position pattern, while two or more consecutive checks by the same side during which the score deteriorates for that side is an example of movement pattern, that can be called “wasted opportunity by futile checking”. Actually, that would be a movement pattern even with just one check, because it would involve score examination of two positions, before and after check. Anyway, the point is that a pattern can be whatever a human user finds meaningful, and can code for in the filter function, which leads to the third point I wanted to explicate. Namely, if certain pattern can be envisaged in advance as important and its filter function  can be implemented by manual coding easily and precisely, it obviously makes no sense whatsoever to deploy machine learning and NN training to obtain the function of recognizing that pattern, contrary to what I supposed was done in case of DecodeChess, however, one can draw parallels between this and MuZero functions legal_move_generator function and recognize_terminal_game_state, how are they obtained and why. If one wants a general algorithm that gets as little specific programmer’s input as possible because it can autonomously decide on what is important, then it would make sense in DecodeChess case to deploy machine learning heavily to identify all patterns by itself, unsupervised by anything else but oracles such as engines and EGTBs, ie to determine what’s been actually played when certain side gets the upper hand. But it might be that DecodeChess has manually hardcoded recognition of all its patterns, adding just a bit of natural language generation on top of that, and there you have it, a chess explainer.
Let us get back to KRK endgame, I got the idea to investigate how exactly is Stockfish going to play it, only after I learned about alternatives, such as those described in a series of articles authored by Maliković, Janičić, Marić and Čubrilo in which they demonstrate computer-aided proofs of KRK theorem:

That is a fascinating subject that made me aware among other things of existence of Bratko’s strategy for KRK. 

However, I wasn’t satisfied with my presentation of KRK subject in the previous essay, so I decided to rewrite it in a form of pen and paper proof:

Auxiliary Definition 1: two kings stand in opposition if they are on the same line two squares away.
Corollary 1: two kings in opposition prevent each other to move to three squares adjacent to both of them, ie to move to the line that separates them, and is perpendicular to the line they both stand on. Proof: trivially follows from Definition 1 and from the game rules.
Corollary 2: when king is not placed at the edge of the board, and is not constrained by other pieces, it has access to three lines parallel to the same edge, ie to three ranks and to three files. Proof: trivally follows from the game rules.
Obviously, these two are not corollaries of the main theorem, but of the game rules, they are also lemmas to main theorem, so I could have named them also that way. 
Lemma 1: in KRK endgame checkmate is possible if and only if BK is at the edge of the board. Proof:
BK is checkmated if WK stands in opposition on the line perpendicular to the edge at which both BK and WR stand, with d(BK,WR)>1, if BK is in the corner, WK can also stand one square away from the previously described opposition square, two lines away from the edge at which both BK and WR stand, with Chebyshev d(WK,BK)=2.
In both cases WK denies BK access to penultimate line parallel to the edge BK stands on, in case of opposition according to Corollary 1, and in the other case corner placement helps due to the fact that the third square not guarded by WK does not exist, and WR covers the edge. No other checkmate position is possible, because there are no other pieces that could prevent BK from accessing the third line BK would have access to, according to Corollary 2, had BK not been placed at the edge, and this proves Lemma 1.
Now, in order to prove the main theorem, one has to prove two more lemmas:
Lemma 2: in KRK endgame it is always possible to force BK to one edge
Lemma 3: in KRK endgame, if BK is constrained to one edge by WR standing on the penultimate line which is parallel to that edge and separates two kings, it is always possible to force position in which it is white to move and deliver checkmate, by achieving position described in Lemma 1. Proof:
In order to prepare that mate in 1 position, white will make three kinds of moves: 
1.) decreasing Manhattan d(WK,BK), approaching with king without ever stepping onto penultimate line or into opposition
2.) shifting rook from one edge to another staying safe on the same penultimate line when BK threatens its capture
3.) waiting move with rook when 1.) is not possible and 2.) is not necessary, this is any move on penultimate line which does not change the ordering of pieces and keeps WR safe from capturing
It is obvious that applying 2.) white can always obtain tempo to approach with the king to a line adjacent to that which rook holds, two lines away from the edge where BK stands, from any other line further away, and to transpose to a position in which WK stands between BK and WR in terms of lines perpendicular to the edge where BK stands, although WR stands all the time between WK and BK in terms of lines parallel to that edge. From that position, black has two choices moving king along the edge, either to increase or decrease Manhattan d(WK,BK). If it tries to increase it, white will follow with 1.), but black is limited in this strategy with the corner, if it tries to decrease it, white will either follow with 1.) if possible, or 3.) otherwise, limit to this strategy is black stepping into opposition, when white follows with checkmate, which proves Lemma 3.
In the same vein, one can prove Lemma 2, and complete the theorem:
It is always possible to place safely from capturing WR on the line that separates two kings, that conclusion is rather trivial. From there white will apply exactly the same strategy as in Lemma 3, the only difference is that instead of checkmate it will result in squeezing BK one line nearer the edge, the limit of this strategy is BK ending at the edge.

I double checked the quality of this proof by posting it here:


The fourth point I wanted to correct is about comparison between legal_move_generator and recognize_terminal_game_state functions on one side and evaluate_position on the other.
After studying:

and

I realized that the law of diminishing returns regarding the quality of approximation applies with respect to growing NN training dataset (which must be done to avoid overfitting if one wants to increase the number of NN training iterations), and with respect to growing NN topology, regardless of Kolmogorov complexity of functions one is trying to approximate, the only difference is that with evaluate_position we actually move our goal constantly towards the greater complexity to achieve higher accuracy (and that is why we need that growth, which is not the case with the first two functions where perfect accuracy can be achieved with a simple function), which then allows less search of the game tree, for the same playing strength of the engine which deploys such function. If we were satisfied with the simplest Shannon like evaluation of position, such function could be approximated sufficiently well by training a small NN on a small training dataset (just like the first two functions), but using it would require much more extensive search in order to accomplish level of play probably still inferior to that of engines that deploy more complex evaluation, that is a result of selfplay in AlphaZero training manner. Not to mention that such function can be easily handcrafted (manually implemented) instead of being result of NN training, while vice versa is not possible, we cannot manually implement the evaluation of position that is at the level achieved by selfplay in Alphazero training manner.
So, these are two things, how close can a NN approximate a manually implemented function, and how accurately this function fits the actual needs, this distinction becomes blur if we do not manually implement the function in practice, but it still exist in theory.




 

Comments

Popular posts from this blog

On Cancer and Evolution of Multicellular Organisms

  I noticed on youtube an excellent podcast called  Target Cancer , hosted by Sanjay Juneja, about all the latest technologies and treatments for cancer, and decided to review two of its episodes which I find particularly important and informative, the one in which a guest was Jason Fung:  The Surprising Link Between Intermittent Fasting, Diabetes, and Cancer. Dr. Fung Explains - Part 2  , and the one in which a guest was Michael Levin:  Fixing cancer cells and Immortality .  To me, there is no doubt about which of the two questions mentioned in the title of this essay is more important, it is cancer, however, the connection between them increases the importance of the other too. To an untrained eye in this issues, to which the existence of this web site may also come as a surprise:  https://www.thethirdwayofevolution.com/  , let me sketch that connection. In his exposé, Dr. Fung uses this table to compare traits of three categories of living agents: The last row raises the question, i

Two Challenges

 I continued to follow Michael Levin on youtube,  Michael Levin & Matthew Segall discuss Meaning, Matter & Memory in Developmental Biology  and started to notice a rather strange comment repeated by that brilliant scientist. When people ask "Where is the information that controlls morphogenesis written?", he adds in that context the following question "Where does the shape of the bell curve of normal distribution that emerges in stochastic processes (like when dropping marbles into Galton board) come from?", trying to say that not everything has to be written somewhere, "we get some things for free from mathematics". And this is true, we get so many things like that, that is a whole point of applied mathematics, for example a trajectory of a stone thrown by hand is a parabola, it may degenerate to a line if we throw it vertically, and there is always a slight air resistance, but if we disregard that, we can ask the same question: "Where does p

More on AGI

  Have you ever wondered what mathematical abstraction describes a computer program most adequately? If you consider it a black box, then the right answer is probably a function, because from the outside a program appears as a mapper of some input data into some output data. If you consider how it does what it does, then some additional abstractions may come into mind, like an algorithm, because a program is a sequence of instructions, non linear, in a sense that it can make loops, branching, forking, spawning, calling (sub)functions in the main function, etc. But it always presents a decomposition of the main task into smaller steps, subtasks, so, this decomposition is the essence of what an algorithm is. Obviously there must be some smallest steps then, that need no decomposition in order to be performed, or rather, that cannot be decomposed to anything more basic/fundamental on a certain computer. These are either machine instructions, or instructions of the higher level programming