Assistant Professor
@ UNIST AIGS & CSE

Youngsoo Jang

I am an Assistant Professor in the Department of Artificial Intelligence Graduate School (AIGS) and Computer Science and Engineering (CSE) at Ulsan National Institute of Science and Technology (UNIST), where I lead the AI Cognition Optimization with Reinforcement Learning (AI-CORE) Lab. Before joining UNIST, I was a research scientist at LG AI Research (working with Moontae Lee and Honglak Lee). I completed my Ph.D. and M.S. at KAIST (advised by Kee-Eung Kim).

Research Interest: Reinforcement Learning (RL), Vision-Language-Action (VLA) Model, Large Language Models (LLMs), Reasoning and Planning, Reinforcement Learning from Human Feedback (RLHF)

Contact: youngsoo.jang [at] unist.ac.kr

Notice

I am actively recruiting motivated research interns and graduate students (M.S./Ph.D.) who are passionate about RL, VLA, LLM, and Robotics. If you are interested, please feel free to email me (youngsoo.jang@unist.ac.kr) with your CV and academic transcript.
Our lab holds a fund for Innocore PostDoc position (salary: 90 million KRW/yr for max 5 yrs). If you are interested, please email me (youngsoo.jang@unist.ac.kr) with your CV and research statement.

Education

2018.03 - 2022.08: Ph.D. in Computer Science, KAIST, Republic of Korea (Advisor: Prof. Kee-Eung Kim)
2016.03 - 2018.02: M.S. in Computer Science, KAIST, Republic of Korea (Advisor: Prof. Kee-Eung Kim)
2011.02 - 2016.02: B.S. in Mathematical Science and Computer Science (Double Major), KAIST, Republic of Korea

Work Experience

2025.09 - Present: Assistant Professor at UNIST (AIGS & CSE)
2022.08 - 2025.08: Research Scientist (Code LLM & RL Squad Leader) at LG AI Research, Superintelligence Lab (Prev: Advanced ML Lab)

Publications

A Regret Minimization Framework on Preference Learning in Large Language Models

Suhwan Kim*, Taehyun Cho*, Geon-Hyeong Kim, Yu Jin Kim, Youngsoo Jang†, Moontae Lee†, and Jungwoo Lee† (†: co-corresponding)
Proceedings of International Conference on Machine Learning (ICML). 2026 (Spotlight Paper, Top 2.2%)

Efficiently Learning To Reason or Not to Reason: Root-token Policy Optimization for Adaptive Thinking

Taehyeon Kim, Hyunsoo Lee, Youngsoo Jang†, and Moontae Lee† (†: co-corresponding)
Proceedings of Association for Computational Linguistics (ACL). 2026 (Oral Presentation)

SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety

Geon-Hyeong Kim, Yu Jin Kim, Byoungjip Kim, Honglak Lee, Kyunghoon Bae, Youngsoo Jang†, and Moontae Lee† (†: co-corresponding)
International Conference on Learning Representations (ICLR). 2026 (Oral Presentation, Top 1.1%)

IRPO: Implicit Policy Regularized Preference Optimization

Youngsoo Jang, Yu Jin Kim, Geon-Hyeong Kim, Honglak Lee, and Moontae Lee
Conference of the European Chapter of the Association for Computational Linguistics (EACL) Findings. 2026 (Oral Presentation, Top 8.5%)

Online Pre-Training for Offline-to-Online Reinforcement Learning

Yongjae Shin, Jeonghye Kim, Whiyoung Jung, Sunghoon Hong, Deunsol Yoon, Youngsoo Jang, Geon-Hyeong Kim, Jongseong Chae, Youngchul Sung, Kanghoon Lee, and Woohyung Lim
Proceedings of International Conference on Machine Learning (ICML). 2025

Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection

Kyungjae Lee*, Dasol Hwang*, Sunghyun Park*, Youngsoo Jang, and Moontae Lee
arXiv preprint, 2025

Prospector: Improving LLM Agents with Self-Asking and Trajectory Ranking

Byoungjip Kim, Youngsoo Jang, Lajanugen Logeswaran, Geon-Hyeong Kim, Yu Jin Kim, Honglak Lee, and Moontae Lee
Conference on Empirical Methods in Natural Language Processing (EMNLP) Findings. 2024
Proceedings of Neural Information Processing Systems (NeurIPS) Foundation Models for Decision Making Workshop. 2023

Semantic Skill Grounding for Embodied Instruction-Following in Cross-Domain Environments

Sangwoo Shin*, SeungHyun Kim*, Youngsoo Jang, Moontae Lee, and Honguk Woo (*: equal contribution)
Association for Computational Linguistics (ACL) Findings. 2024

Degeneration-free Policy Optimization: RL Fine-Tuning for Language Models without Degeneration

Youngsoo Jang, Geon-Hyeong Kim, Byoungjip Kim, Yu Jin Kim, Honglak Lee, and Moontae Lee
Proceedings of International Conference on Machine Learning (ICML). 2024
International Conference on Learning Representations (ICLR) Workshop on Generative Models for Decision Making. 2024

Show, Think, and Tell: Thought-Augmented Fine-Tuning of Large Language Models for Video Captioning

Byoungjip Kim, Dasol Hwang, Sungjun Cho, Youngsoo Jang, Honglak Lee, Moontae Lee
The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Multi-Modal Foundation Models. 2024

SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations

Youngsoo Jang, Geon-Hyeong Kim, Jongmin Lee, Sungryull Sohn, Byoungjip Kim, Honglak Lee, and Moontae Lee
Proceedings of Neural Information Processing Systems (NeurIPS). 2023

Information-Theoretic State Space Model for Multi-View Reinforcement Learning

HyeongJoo Hwang, Seokin Seo, Youngsoo Jang, Sungyoon Kim, Geon-Hyeong Kim, Seunghoon Hong, and Kee-Eung Kim
Proceedings of International Conference on Machine Learning (ICML). 2023 (Oral Presentation, Top 2.3%)

LobsDICE: Offline Imitation Learning from Observation via Stationary Distribution Correction Estimation

Geon-Hyeong Kim*, Jongmin Lee*, Youngsoo Jang, Hongseok Yang, and Kee-Eung Kim (*: equal contribution)
Proceedings of Neural Information Processing Systems (NeurIPS). 2022

GPT-Critic: Offline Reinforcement Learning for End-to-End Task-Oriented Dialogue Systems

Youngsoo Jang, Jongmin Lee, and Kee-Eung Kim
International Conference on Learning Representations (ICLR). 2022

Monte-Carlo Planning and Learning with Language Action Value Estimates

Youngsoo Jang, Seokin Seo, Jongmin Lee, and Kee-Eung Kim
International Conference on Learning Representations (ICLR). 2021

Variational Inference for Sequential Data with Future Likelihood Estimates

Geon-Hyeong Kim, Youngsoo Jang, Hongseok Yang, and Kee-Eung Kim
Proceedings of International Conference on Machine Learning (ICML). 2020

End-to-End Neural Pipeline for Goal-Oriented Dialogue System using GPT-2

Donghoon Ham*, Jeong-Gwan Lee*, Youngsoo Jang, and Kee-Eung Kim (*: equal contribution)
Proceedings of Association for Computational Linguistics (ACL). 2020
Proceedings of AAAI Conference on Artificial Intelligence (AAAI) DSTC8 Workshop. 2020 (Oral Presentation)
1st place on 8th Dialog System Technology Challenge (DSTC8) Multi-domain Task Completion Track, 2019

Bayes-Adaptive Monte-Carlo Planning and Learning for Goal-Oriented Dialogues

Youngsoo Jang, Jongmin Lee, and Kee-Eung Kim
Proceedings of AAAI Conference on Artificial Intelligence (AAAI). 2020
Proceedings of Neural Information Processing Systems (NeurIPS) Conversational AI Workshop. 2019 (Oral Presentation)

Trust Region Sequential Variational Inference

Geon-Hyeong Kim, Youngsoo Jang, Jongmin Lee, Wonseok Jeon, Hongseok Yang, and Kee-Eung Kim
Proceedings of Asian Conference on Machine Learning (ACML). 2019

PyOpenDial: A Python-based Domain-Independent Toolkit for Developing Spoken Systems with Probabilistic Rules

Youngsoo Jang*, Jongmin Lee*, Jaeyoung Park*, Kyeng-Hun Lee, Pierre Lison, and KeeEung Kim (*: equal contribution)
Proceedings of Empirical Methods in Natural Language Processing (EMNLP), System Demonstrations. 2019

Cross-language Neural Dialog State Tracker for Large Ontologies using Hierarchical Attention

Youngsoo Jang, Jiyeon Ham, Byung-Jun Lee, and Kee-Eung Kim
IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP). 2018

Constrained Bayesian Reinforcement Learning via Approximate Linear Programming

Jongmin Lee, Youngsoo Jang, Pascal Poupart, and Kee-Eung Kim
Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2017
ECML-PKDD Workshop on Scaling-Up Reinforcement Learning (SURL). 2017

Neural Dialog State Tracker for Large Ontologies by Attention Mechanism

Youngsoo Jang*, Jiyeon Ham*, Byung-Jun Lee, Youngjae Chang, and Kee-Eung Kim (*: equal contribution)
IEEE Workshop on Spoken Language Technology (SLT). 2016 / 3rd place on 5th Dialog State Tracking Challenge (DSTC)

Awards and Honors

NeurIPS Outstanding Reviewer Award, 2021
Qualcomm-KAIST Innovation Awards, Qualcomm, 2019 (awarded $5,000)
1st place on 8th Dialog System Technology Challenge (DSTC8) Multi-domain Task Completion Track, 2019
Naver Ph.D. Fellowship, NAVER, 2018 (awarded $5,000)
3rd place on 5th Dialog State Tracking Challenge (DSTC5), 2016
Best TA Award, Introduction to Programming (CS101, 2016 Spring), School of Computing, KAIST, Sep. 2016

Academic Talks

Degeneration-free Policy Optimization: RL Fine-Tuning for Language Models without Degeneration

2024. 07. 23. ICML 2024, Vienna, Austria

SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations

2023. 12. 15. NeurIPS 2023, New Orleans, USA

GPT-Critic: Offline Reinforcement Learning for End-to-End Task-Oriented Dialogue Systems

2022. 04. 26. ICLR 2022, Virtual
2022. 05. 11. NAVER, Virtual

Monte-Carlo Planning and Learning with Language Action Value Estimates

2021. 05. 04. ICLR 2021 ML in Korea, Virtual
2021. 05. 04. ICLR 2021, Virtual

Bayes-Adaptive Monte-Carlo Planning and Learning for Goal-Oriented Dialogues

2020. 07. 13. KAKAO Brain, Pangyo, Korea
2020. 02. 11. AAAI Conference on Artificial Intelligence, New York, USA
2019. 12. 14. NeurIPS Conversational AI Workshop 2019, Vancouver, Canada
2019. 11. 29. Qualcomm-KAIST Innovation Award 2019, KAIST, Korea
2019. 09. 27. KAKAO, Jeju, Korea

Teaching Experiences

Machine Learning (CS376), TA, KAIST, 2019
Counselor Assistant (CA), KAIST, 2019
Introduction to Programming (CS101), Head TA, KAIST, 2018
Data Structure (CS206), TA, KAIST, 2016
Introduction to Programming (CS101), TA, KAIST, 2016

Academic Services

NeurIPS Reviewer
ICML Reviewer
ICLR Reviewer
AAAI Reviewer
ACL Reviewer
EMNLP Reviewer
NAACL Reviewer

Page updated

Google Sites

Report abuse