What’s New in Stanford NLP and Stanza
In this talk, we present recent updates to our natural language analysis toolkits, CoreNLP and Stanza, and talk about recent work happening at Stanford NLP in building and using large language models.
CoreNLP and Stanza, annotation tools in Java and Python respectively, incorporate new research work to expand their capabilities and improve existing models, such as an improved constituency parsing module and a worldwide English NER dataset.
The CRFM at Stanford has introduced two new open source tools for building LLMs and measuring their performance, Helm and Levanter.
We also present a biomedical LLM, a neural search engine ColBERT, and an LLM query compiler DSPy.
I will talk about Stanza’s neural architectural design, its simple user interface, and its improved performance against existing toolkits over a range of datasets covering 70 languages. Our latest updates include NER support for English from around the world, an interface to edit dependencies, a state of the art constituency parser, and more. I will close my talk by talking about our future plans for the Stanza library.
Research Engineer at Stanford
John Bauer is a software engineer working with the NLP research group at Stanford. He has been an author on several NLP papers, including the widely used CoreNLP software package, and is currently maintaining Stanford’s Stanza software package.