Openai gym environment for multi-agent games

题意：用于多智能体游戏的 OpenAI Gym 环境

问题背景：

Is it possible to use openai's gym environments for multi-agent games? Specifically, I would like to model a card game with four players (agents). The player scoring a turn starts the next turn. How would I model the necessary coordination between the players (e.g. who's turn it is next)? Ultimately, I would like to use reinforcement learning on four agents that play against each other.

是否可以使用 OpenAI 的 Gym 环境来进行多智能体游戏？具体来说，我想模拟一个有四个玩家（智能体）的纸牌游戏。得分的玩家将在下一轮开始。如何模拟玩家之间必要的协调（例如下一个轮到谁）？最终，我希望在四个相互对战的智能体上使用强化学习

问题解决：

Yes, it is possible to use OpenAI gym environments for multi-agent games. Although in the OpenAI gym community there is no standardized interface for multi-agent environments, it is easy enough to build an OpenAI gym that supports this. For instance, in OpenAI's recent work on multi-agent particle environments they make a multi-agent environment that inherits from gym.Env which takes the following form:

是的，可以使用 OpenAI Gym 环境进行多智能体游戏。尽管在 OpenAI Gym 社区中还没有标准化的多智能体环境接口，但构建一个支持多智能体的 OpenAI Gym 环境并不困难。例如，在 OpenAI 最近关于多智能体粒子环境的研究中，他们创建了一个从 gym.Env 继承的多智能体环境，其形式如下

class MultiAgentEnv(gym.Env):def step(self, action_n):obs_n    = list()reward_n = list()done_n   = list()info_n   = {'n': []}# ...return obs_n, reward_n, done_n, info_n

We can see that the step function takes a list of actions (one for each agent) and returns a list of observations, list of rewards, list of dones, while stepping the environment forwards. This interface is representative of Markov Game, in which all agents take actions at the same time and each observe their own subsequent observation, reward.

我们可以看到，step 函数接受一个动作列表（每个智能体一个）并返回观察列表、奖励列表和完成标志列表，同时向前推进环境。这个接口代表了马尔可夫博弈，其中所有智能体同时采取行动，并各自观察自己的后续观察和奖励

However, this kind of Markov Game interface may not be suitable for all multi-agent environments. In particular, turn-based games (such as card games) might be better cast as an alternating Markov Game, in which agents take turns (i.e. actions) one at a time. For this kind of environment, you may need to include which agent's turn it is in the representation of state, and your step function would then just take a single action, and return a single observation, reward and done.

然而，这种马尔可夫博弈接口可能并不适合所有的多智能体环境。特别是回合制游戏（如纸牌游戏）可能更适合被视为交替的马尔可夫博弈，其中智能体一次轮流采取行动（即一次一个动作）。对于这种环境，您可能需要在状态表示中包含当前轮到哪个智能体的相关信息，而您的 step 函数则只接受一个动作，并返回一个观察、奖励和完成标志