今晚8点直播：因果强化学习 | 因果科学与Causal AI读书会

2020-11-29 3,132 0

今晚8点，“因果科学与Causal AI”读书会将进行第十期的线上论文分享，主题是“因果强化学习”，将由剑桥大学在读博士陆超超、清华大学在读博士张卓婧来进行分享，本次内容将在集智俱乐部B站直播。

主题背景简介

近些年来，在游戏领域强化学习取得了巨大的进展，已经能够打败人类最顶级的玩家。但在现实生活中，我们却罕见强化学习的成功应用。带着这种疑惑，在本次读书会上，我们将探访机器学习中一个全新的领域——因果强化学习。因果强化学习不仅可以为传统强化学习中那些棘手的问题提供一种新的解决方案，更重要的是，它还为通用人工智能提供一种解决思路。隐藏在因果强化学习背后的哲学理念是迷人的：回顾科学发展的历史，人类也是走着相似的道路。具体来说，人类是在不断地与自然交互探索的过程中总结经验和规律，然后利用这些经验和规律更好地指导下一次与自然的交互探索，以此来不断地发展进步。因果强化学习就是在模仿人类的这种行为：智能体在于环境的交互过程中学习和发现其因果关系，然后利用学到的因果关系来优化自己的策略以指导下一步的交互。正是由于这个原因，因果强化学习可以被视为一种通用的学习算法，在现实生活中有着广泛的应用，比如：计算机视觉、机器人、生物医药、健康医疗、推荐系统，自动驾驶，金融、社会学等等。

大纲

Introduction to Causal RL

Brief Intro to RL and Causality
Motivation
Key Concepts
Challenges

Confoundings
Counterfactuals
Causal Representation Learning
Artificial General Intelligence

Applications

Computer Vision
Healthcare/Medicine
Self-driving
Recommendation Systems

Paper Reading

Causality for RL [1-3]
RL for Causality [4-5]

Discussion

主讲人介绍

陆超超：剑桥大学在读博士，研究方向为因果强化学习。

张卓婧：清华大学在读博士，研究兴趣是因果强化学习。

参考文献

[1] Dudik, M., Langford, J., Li, L. Doubly robust policy evaluation and learning. In Proceedings of 28th International Conference on Machine Learning. 2011.

[2] Bareinboim, E., Forney, A., Pearl, J. Bandits with Unobserved Confounders: A Causal Approach. In Proceedings of the 28th Annual Conference on Neural Information Processing Systems, 2015.

[3] Zhang, J., Bareinboim, E. Designing Optimal Dynamic Treatment Regimes: A Causal Reinforcement Learning Approach. In Proceedings of the 37th International Conference on Machine Learning. 2020.

[4] Khalil, Elias, et al. “Learning combinatorial optimization algorithms over graphs.” Advances in Neural Information Processing Systems. 2017.

[5] Zhu, Shengyu, Ignavier Ng, and Zhitang Chen. “Causal discovery with reinforcement learning.” arXiv preprint arXiv:1906.04477 (2019).

[6] Lu et al. “Deconfounding reinforcement learning in observational settings.” arXiv preprint arXiv:1812.10576

[7] Schoelkopf, Bernhard. “Causality for machinine learning.” arXiv preprint arXiv:1911.10500

[8] Buesing et al. “Woulda, coulda, shoulda: Counterfactually-guided policy search.” arXiv preprint arXiv:1811.06272

[9] de Haan et al. “Causal confusion in imitation learning.” arXiv preprint arXiv: 1905.11979

直播信息与报名方式

直播时间：今天（11月29日） 20:00-23:00

直播地址：集智俱乐部 B 站直播间

👀关注B站主播“集智俱乐部”

不错过每一场集智重磅直播