Search


	Podcast:		TalkRL: The Reinforcement Learning Podcast
	Episode:		Arash Ahmadian on Rethinking RLHF
	Category:		Technology
	Duration:		00:33:30
	Publish Date:		2024-03-25 06:46:00
	Description:		Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI. Featured Reference Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker Additional References Self-Rewarding Language Models, Yuan et al 2024 Reinforcement Learning: An Introduction, Sutton and Barto 1992 Learning from Delayed Rewards, Chris Watkins 1989 Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992
	Total Play:		0