^_^

Direct Preference Optimization (DPO)

2024

LLM & RLHF - Paper Reading Notes
·758 words·4 mins
Machine Learning (ML) Large Language Models (LLMs) Direct Preference Optimization (DPO) Preference Learning