Binfeng Xu

Binfeng (Bill) Xu

I am a Senior Research Engineer at Samsung Research America (SRA) specialized in Post-training large language models, and building on-device agentic sytems.

I completed my MS. degree from NYU in 2022, majoring in Computer Science and Data Science, during which I briefly worked with Alfredo Canziani and Yann LeCun on policy learning for autonomous driving. Prior to this, I earned BS. degree from WFU in 2020, doubling Computer Science and Statistics, advised by Grey Ballard on efficient Machine Learning (Tucker Decomposition) for neuro fMRI and by Paúl Pauca on multiple Computer Vision projects. In my spare time, I enjoy competing on ML challenges on Kaggle and have ranked top 1% globally titled 'Competition Master'.

For the last 10 years, I have been hands-on with a broad range of AI and ML research and applications. I enjoy training large models on massive data and compute, and to build products out of ideas end-to-end.

Email | GitHub | LinkedIn | Scholar |

Recent Work & Interests

LLM Post-training. Lately I've been working and researching on the whole spectrum of post-training algorithms and infra: from data synthesis to Knowledge Distillation (eg. SFT with KL), Preference RLHF (eg. DPO), RLVR (eg. GRPO on hard math), Memory efficient Fine-tuning (eg. LoRA), Inference Acceleration(eg. SSM, Flash Attention, Speculative Decoding), Quantization and Compilation (eg. AWQ, llama.cpp), Model Mergeing (eg. slerp), etc. I've also spent tons of time on product and architecting, and have delivered several core projects for Samsung on-deive RAG, Knowledge Graph QA and agentic chatbots.

LLM Reasoning and Machine Intelligence. Generally bipartites into:

Conceptual learning: Modeling multimodal world patterns with (very) long-context autoregressive transformers. eg. LWM. Recently, Deepseek embarks a successful trail (GRPO) to self-supervise LLM on verified rewards, at a cost of increased test-time compute.
Generative reasoning: Symbolic Learning models (eg. AlphaGo) shows the strength at (relatively) small action space, which inspires recent LLM research to annotate data with Tree Search sampling (eg. MCTS) and process supervision. I'm currently more interested in GFlowNet in stochastic prediction of wild world space.

Tool-augmented LLM Agents: Training large models for autonomous actions with optimized reasoning and planning chains.

ReWOO : Eliminate stacking redundancy in ALM systems by decoupling LLM reasoning from observations.
Gentopia : A collaborative agentic framework to build hierarchical agents through config, integrating model specialization, evaluation, sharing and inheritance. [demo].

Papers

Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Binfeng Xu, Xukun Liu, Hua Shen, Zeyu H, Yuhan L, Murong Y, Zhiyuan P, Yuchen L, Ziyu Y, Dongkuan Xu

ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
Binfeng Xu, Zhiyuan Peng, Bowen Lei, Subhabrata Mukherjee, Yuchen Liu, Dongkuan Xu

Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data
Haoyan Yang, Ting Hua, Shangqian Gao, Binfeng Xu, Zheng Tang, Jie Xu, Hongxia Jin, Vijay Srinivasan

Efficient Computation of Tucker Decomposition of Correlation-Based Tensors
Binfeng Xu, Grey Ballard, Robert Lyday, Paul Laurienti

Iterative Constringency Optimization: Preclustering Approach to Agent Interactive Data
Binfeng Xu, Nicole Dalzell

Misc

Petting Kobu, the Norway Forest 🐱; Cyberpunk; Digital nomad (someday); Fan of all games by Hidetaka Miyazaki, who motivated me once into indie game devs; Good at Dota2 (once). Photography @500px; Minimalist;