Back to Blog

CoT and LLM/SLM Collaboration Paper List

Updated at October 7, 2024 (3mo ago)

Publisbed at September 29, 2024 (3mo ago)

Recent Interesting News

Llama 3.2 Family is released.

Chain of Thought(CoT)

Survey

Different Approaches of CoX, for example,

Paper

CHAIN-OF-EXPERTS: WHEN LLMS MEET COMPLEX OPERATIONS RESEARCH PROBLEMS by Z. Xiao et al.

Type: Chain-of-Models

Involved multiple LLMs or so-called agents to implement different steps of thought CHAIN-OF-EXPERTS

Teaching Small Language Models to Reason for Knowledge-Intensive Multi-Hop Question Answering by X. Li et al.

Distillation by generating sub-questions from LLMs to train SLMs with CoT, called Decompose and Response Distillation Decompose and Response Distillation

Chain-of-Probe: Examing the Necessity and Accuracy of CoT Step-by-Step by Z. Wang et al.

Early Answering : LLMs have already predicted an answer before generating the CoT.

: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models by H. Liao et al.

Teaching Small Language Models to Reason by L. C. Magister, J. Mallinson, J. Adamek, E. Malmi, and A. Severyn

Large Language Models Are Reasoning Teachers by N. Ho, L. Schmid, and S.-Y. Yun

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning by Z. Sprague et al.

On-device and Edge-cloud collaboration of LLM/Small

Survey

Paper

PerLLM: Personalized Inference Scheduling with Edge-Cloud Collaboration for Diverse LLM Services by Z. Yang, Y. Yang, C. Zhao, Q. Guo, W. He, and W. Ji

CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following by K. Zhang, J. Wang, E. Hua, B. Qi, N. Ding, and B. Zhou

Apple Intelligence Foundation Language Models by T. Gunter et al.

Decentralized LLM Inference over Edge Networks with Energy Harvesting by A. Khoshsirat, G. Perin, and M. Rossi

WDMoE: Wireless Distributed Large Language Models with Mixture of Experts by N. Xue et al.

EdgeShard: Efficient LLM Inference via Collaborative Edge Computing by M. Zhang, J. Cao, X. Shen, and Z. Cui

Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference by J. Bang, J. Lee, K. Shim, S. Yang, and S. Chang

LLMCad: Fast and Scalable On-device Large Language Model Inference by D. Xu et al.

SPA: Towards A Computational Friendly Cloud-Base and On-Devices Collaboration Seq2seq Personalized Generation by Y. Liu et al.

Hybrid SLM and LLM for Edge-Cloud Collaborative Inference by Z. Hao, H. Jiang, S. Jiang, J. Ren, and T. Cao

GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment by Y. Yao, Z. Li, and H. Zhao

Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing by D. Ding et al.

Hybrid SLM and LLM for Edge-Cloud Collaborative Inference by Z. Hao, H. Jiang, S. Jiang, J. Ren, and T. Cao

Enhancing On-Device LLM Inference with Historical Cloud-Based LLM Interactions by Y. Ding, C. Niu, F. Wu, S. Tang, C. Lyu, and G. Chen