CertRL: Formalizing Convergence Proofs for Value and Policy Iteration in Coq (CPP 2021 - Certified Programs and Proofs)

Who

Koundinya Vajjha, Avraham Shinnar, Barry Trager, Vasily Pestun, Nathan Fulton

Track

CPP 2021

Time Zone

The program is currently displayed in (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 19 Jan 2021 18:15 - 18:30 at CPP - AI and Machine Learning Chair(s): Ekaterina Komendantskaya

Abstract

Reinforcement learning algorithms solve sequential decision-making problems in probabilistic environments by optimizing for long-term reward. The desire to use reinforcement learning in safety-critical settings inspires a recent line of work on formally constrained reinforcement learning; however, these methods place the implementation of the learning algorithm in their Trusted Computing Base. The crucial correctness property of these implementations is a guarantee that the learning algorithm converges to an optimal policy.

This paper begins the work of closing this gap by developing a Coq formalization of two canonical reinforcement learning algorithms: value and policy iteration for finite state Markov decision processes. The central results are a formalization of Bellman’s optimality principle and its proof, which uses a contraction property of Bellman optimality operator to establish that a sequence converges in the infinite horizon limit. The CertRL development exemplifies the Giry monad and mechanized metric coinduction streamline optimality proofs for reinforcement learning algorithms. The CertRL library provides a general framework for proving properties about Markov decision processes and reinforcement learning algorithms, paving the way for further work on formalization of reinforcement learning algorithms.

Link to Preprint

https://arxiv.org/abs/2009.11403

Koundinya Vajjha

University of Pittsburgh

Avraham Shinnar

IBM Research

Barry Trager

IBM Research

Vasily Pestun

IBM Research; IHES

Nathan Fulton

IBM Research

Long Presentation

Short Presentation

Time Zone

The program is currently displayed in (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 19 Jan
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

18:00 - 18:30	AI and Machine LearningCPP at CPP Chair(s): Ekaterina Komendantskaya Heriot-Watt University, UK Streamed session: https://youtu.be/6NJdWdiZEiA

18:00 15m Talk		A Formal Proof of PAC Learnability for Decision Stumps CPP Joseph Tassarotti Boston College, Koundinya Vajjha University of Pittsburgh, Anindya Banerjee IMDEA Software Institute, Jean-Baptiste Tristan Boston College Pre-print Media Attached
18:15 15m Talk		CertRL: Formalizing Convergence Proofs for Value and Policy Iteration in Coq CPP Koundinya Vajjha University of Pittsburgh, Avraham Shinnar IBM Research, Barry Trager IBM Research, Vasily Pestun IBM Research; IHES, Nathan Fulton IBM Research Pre-print Media Attached