In generating (sub-)optimal strategies for perfect information games, the dominant paradigm is reinforcement learning using neural networks to estimate actions (Q-values). The initially applied approach using a tree search with Monte Carlo evaluations was abandoned due to its lack of generalization ability. By applying a probabilistic-combinatorial formal learning method together with the Monte Carlo method we will show how generalized rules can be generated that form the desired strategy through similarities in states where the same action was applied leading to high reward.
Д.В. Виноградов (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: