no code implementations • 17 Apr 2024 • Akifumi Wachi, Thien Q. Tran, Rei Sato, Takumi Tanabe, Youhei Akimoto
This paper formulates human value alignment as an optimization problem of the language model policy to maximize reward under a safety constraint, and then proposes an algorithm, Stepwise Alignment for Constrained Policy Optimization (SACPO).
1 code implementation • 31 Jan 2023 • Rei Sato, Kazuto Fukuchi, Jun Sakuma, Youhei Akimoto
We investigate policy transfer using image-to-semantics translation to mitigate learning difficulties in vision-based robotics control agents.
1 code implementation • 7 Nov 2022 • Takumi Tanabe, Rei Sato, Kazuto Fukuchi, Jun Sakuma, Youhei Akimoto
In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set.
1 code implementation • 11 Dec 2020 • Rei Sato, Jun Sakuma, Youhei Akimoto
In this paper, we propose a novel search strategy for one-shot and sparse propagation NAS, namely AdvantageNAS, which further reduces the time complexity of NAS by reducing the number of search iterations.
1 code implementation • 30 Aug 2019 • Rei Sato, Tetsuro Nikuni, Shohei Watabe
We investigate a quantum spatial search problem on fractal lattices, such as Sierpinski carpets and Menger sponges.
Quantum Physics