no code implementations • 23 May 2024 • Francesco Emanuele Stradi, Anna Lunghi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti
In constrained Markov decision processes (CMDPs) with adversarial rewards and constraints, a well-known impossibility result prevents any algorithm from attaining both sublinear regret and sublinear constraint violation, when competing against a best-in-hindsight policy that satisfies constraints on average.
no code implementations • 6 Mar 2024 • Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti
To the best of our knowledge, our work is the first to study CMDPs involving both adversarial losses and hard constraints.
no code implementations • 5 Feb 2024 • Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti
Recently, Markov persuasion processes (MPPs) have been introduced to capture sequential scenarios where a sender faces a stream of myopic receivers in a Markovian environment.
no code implementations • 27 Apr 2023 • Jacopo Germano, Francesco Emanuele Stradi, Gianmarco Genalti, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti
We study online learning in episodic constrained Markov decision processes (CMDPs), where the goal of the learner is to collect as much reward as possible over the episodes, while guaranteeing that some long-term constraints are satisfied during the learning process.