Generative AI is revolutionizing how we create, share, work with, and consume content. At Microsoft, we run the biggest platform for collaboration and productivity in the world with hundreds of millions of consumer and enterprise users. Tackling AI efficiency challenges is crucial for delivering these experiences at scale. Here at M365 Research , we are working to advance efficiency across AI systems, where we look at novel designs and optimizations across AI stacks: models, AI frameworks, cloud infrastructure, and hardware. We believe that enabling cross-layer optimization is the key to achieve a step-function improvement in AI efficiency and to provide generative AI experiences to more users globally.

We closely collaborate with multiple research teams and product groups across the globe who bring a multitude of technical expertise in cloud systems, machine learning and software engineering. We communicate our research both internally and externally through academic publications, open-source releases, blog posts, patents, and industry conferences. Further, we also collaborate with academic and industry partners to advance the state of the art and target material product impact that will affect 100s of millions of customers.

The role

We are looking for research interns to help advance the state of the art of Efficient AI. The ideal candidate will have a background in machine learning and natural language processing, with experience in enhancing efficiency for large language models. This include areas such as architecture optimization, hybrid computing, adaptive computation, approximate inference, model compression and quantization, and/or efficiency for LLM-based agents.

Candidates should possess the motivation and ambition to apply their expertise in real-world settings, contributing to impactful innovations that will shape the future of AI.

Qualifications

In addition to the qualifications below, you’ll need to submit a minimum of two reference letters for this position. After you submit your application, a request for letters may be sent to your list of references on your behalf. Note that reference letters cannot be requested until after you have submitted your application, and furthermore, that they might not be automatically requested for all candidates. You may wish to alert your letter writers in advance, so they will be ready to submit your letter.

  • Currently enrolled in a PhD program in Computer Science, Mathematics, or a related STEM field.
  • Must have at least 1 year of experience in conducting research, writing peer-reviewed publications and software development.

Preferred qualifications

  • Experience with efficiency and scalability in large language models, e.g. architecture optimization, hybrid computing, adaptive computation, approximate inference, model compression and quantization, and/or efficiency for LLM-based agents.
  • Experience in conducting research and publications in top conferences/journals, in at least one of the following areas: natural language processing, statistics, machine learning, and optimization.
  • Strong programming skills, preferable in Python.
  • Excellent written and verbal communication skills.

Responsibilities

  • Participate in a 12-week internship program designed to provide hands-on experience and professional development.
  • Collaborate with mentors, other interns, and researchers to conduct cutting-edge research and develop efficient AI algorithms and models.
  • Present your findings and actively contribute to the vibrant life of the research community. · Engage in various research areas to broaden your expertise.
  • Intern opportunities are available year-round, though they typically begin in the summer.