Jekyll2023-11-12T20:32:06+00:00akshatsh.github.io/feed.xmlAkshat ShrivastavaResearch Scientist at FacebookPublication Overview2022-01-01T00:00:00+00:002022-01-01T00:00:00+00:00akshatsh.github.io/publication<h2 id="papers">Papers</h2>
<h3 id="2023">2023</h3>
<ul>
<li><a href="https://akshatsh.github.io/papers/SLU_ICASSP_Challenge_Summary_upload.pdf">ICASSP 2023 SPOKEN LANGUAGE UNDERSTANDING GRAND CHALLENGE</a> | <a href="https://facebookresearch.github.io/spoken_task_oriented_parsing/">Challenge Website</a><br />
<strong>Akshat Shrivastava</strong>, Suyoun Kim, Paden Tomasello, Ali Elkahky, Daniel Lazar,Trang Le, Shan Jiang, Duc Le, Aleksandr Livshits, Ahmed Aly <br />
<strong>ICASSP 2023</strong></li>
<li><a href="https://arxiv.org/abs/2309.09390">Augmenting text for spoken language understanding with Large Language Models</a><br />
Roshan Sharma, Suyoun Kim, Daniel Lazar, Trang Le, <strong>Akshat Shrivastava</strong>, Kwanghoon Ahn, Piyush Kansal, Leda Sari, Ozlem Kalinli, Michael Seltzer<br />
Preprint</li>
<li><a href="https://arxiv.org/abs/2307.12134">Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding</a><br />
Suyoun Kim, <strong>Akshat Shrivastava</strong>, Duc Le, Ju Lin, Ozlem Kalinli, Michael L. Seltzer<br />
<strong>Interspeech 2022</strong></li>
<li><a href="https://arxiv.org/abs/2303.1716">TreePiece: Faster Semantic Parsing via Tree Tokenization</a><br />
Sid Wang, <strong>Akshat Shrivastava</strong>, Sasha Livshits<br />
<strong>EMNLP Findings 2022</strong></li>
<li><a href="https://arxiv.org/abs/2302.09042">Privately Customizing Prefinetuning to Better Match User Data in Federated Learning</a><br />
Charlie Hou, Hongyuan Zhan, <strong>Akshat Shrivastava</strong>, Sida I. Wang, Sasha Livshits, Giulia C. Fanti, Daniel Lazar
<strong>Workshop on the pitfalls of limited data and computation for Trustworthy ML, ICLR 2023</strong></li>
</ul>
<h3 id="2022">2022</h3>
<ul>
<li><a href="https://arxiv.org/abs/2211.08402">Introducing Semantics into Speech Encoders</a> <br />
Derek Xu, Shuyan Dong, Changhan Wang, Suyoun Kim, Zhaojiang Lin, <strong>Akshat Shrivastava</strong>, Shang-Wen Li, Liang-Hsuan Tseng, Alexei Baevski, Guan-Ting Lin, Hung-yi Lee, Yizhou Sun, Wei Wang.
<strong>ACL 2023</strong></li>
<li><a href="https://arxiv.org/abs/2210.03871">Data-Efficiency with a Single GPU: An Exploration of Transfer Methods for Small Language Models</a>.<br />
Alon Albalak, <strong>Akshat Shrivastava</strong>, Chinnadhurai Sankar, Adithya Sagar, Mike Ross.<br />
Preprint</li>
<li><a href="https://arxiv.org/abs/2207.10643">STOP: A dataset for Spoken Task Oriented Semantic Parsing</a> <br />
Paden Tomasello, <strong>Akshat Shrivastava</strong>, Daniel Lazar, Po-Chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed.<br />
<strong>SLT 2023</strong></li>
<li><a href="https://arxiv.org/abs/2204.01893">Deliberation Model for On-Device Spoken Language Understanding</a> <br />
Duc Le*, <strong>Akshat Shrivastava* (co-first author)</strong>, Paden Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer.<br />
<strong>Interspeech 2022</strong></li>
<li><a href="https://arxiv.org/abs/2202.00901">Retrieve-and-Fill for Scenario-based Task-Oriented Semantic Parsing</a> <br />
<strong>Akshat Shrivastava</strong>, Shrey Desai, Anchit Gupta, Ali Elkahky, Aleksandr Livshits, Alexander Zotov, Ahmed Aly <br />
<strong>EACL 2023</strong></li>
</ul>
<h3 id="2021">2021</h3>
<ul>
<li><a href="https://arxiv.org/abs/2109.10410">RETRONLU: Retrieval Augmented Task-Oriented Semantic Parsing</a> <br />
Vivek Gupta, <strong>Akshat Shrivastava</strong>, Adithya Sagar, Armen Aghajanyan, and Denis Savenkov <br />
<strong>🏆 Outstanding Paper - ConvAI@ACL 2022</strong></li>
<li><a href="https://arxiv.org/abs/2107.04736">Assessing Data Efficiency in Task-Oriented Semantic Parsing</a> <br />
Shrey Desai, <strong>Akshat Shrivastava</strong>, Justin Rill, Brian Moran, Safiyyah Saleem, Alexander Zotov, and Ahmed Aly <br />
Preprint</li>
<li><a href="https://arxiv.org/abs/2106.11890">Latency-Aware Neural Architecture Search with Multi-Objective Bayesian Optimization</a><br />
David Eriksson, Pierce I-Jen Chuang, Sam Daulton, Peng Xia, <strong>Akshat Shrivastava</strong>, Arun Babu, Shicong Zhao, Ahmed Aly, Ganesh Venkatesh, and Maximilian Balandat<br />
<strong>ICML Workshop on Automated Machine Learning 2021</strong></li>
<li><a href="https://arxiv.org/abs/2104.07275">Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing</a><br />
<strong>Akshat Shrivastava</strong>, Pierce Chuang, Arun Babu, Shrey Desai, Abhinav Arora, Alexander Zotov, and Ahmed Aly<br />
<strong>EMNLP 2021 Findings</strong></li>
<li><a href="https://arxiv.org/abs/2104.07224">Low-Resource Task-Oriented Semantic Parsing via Intrinsic Modeling</a><br />
Shrey Desai, <strong>Akshat Shrivastava</strong>, Alexander Zotov, and Ahmed Aly<br />
Preprint</li>
<li><a href="https://arxiv.org/abs/2104.04923">Non-Autoregressive Semantic Parsing for Compositional Task-Oriented Dialog</a><br />
Arun Babu, <strong>Akshat Shrivastava</strong>, Armen Aghajanyan, Ahmed Aly, Angela Fan, and Marjan Ghazvininejad<br />
<strong>NAACL 2021</strong></li>
<li><a href="https://arxiv.org/abs/2101.11038">MUPPET: Massive Multi-task Representations with Pre-Finetuning</a><br />
Armen Aghajanyan, Anchit Gupta, <strong>Akshat Shrivastava</strong>, Xilun Chen, Luke Zettlemoyer, and Sonal Gupta<br />
<strong>EMNLP 2021</strong></li>
</ul>
<h3 id="2020">2020</h3>
<ul>
<li><a href="https://arxiv.org/abs/2009.13655">Conversational Semantic Parsing</a><br />
Armen Aghajanyan, Jean Maillard, <strong>Akshat Shrivastava</strong>, Keith Diedrick, Mike Haeger, Haoran Li, Yashar Mehdad, Ves Stoyanov, Anuj Kumar, Mike Lewis, and Sonal Gupta<br />
<strong>EMNLP 2020</strong></li>
<li><a href="https://openreview.net/pdf?id=Ku-nv600bNM">Cross-lingual Transfer Learning for Intent Detection of Covid-19 Utterances</a><br />
Abhinav Arora, <strong>Akshat Shrivastava</strong>, Mrinal Mohit, Lorena Sainz-Maza Lecanda, and Ahmed Aly<br />
Preprint</li>
<li><a href="https://arxiv.org/abs/2008.03156">Better Fine-Tuning by Reducing Representational Collapse</a><br />
Armen Aghajanyan, <strong>Akshat Shrivastava</strong>, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, and Sonal Gupta<br />
<strong>ICLR 2021</strong></li>
<li><a href="https://github.com/AkshatSh/iSeqL">iSeqL Interactive Sequence Learning</a><br />
<strong>Akshat Shrivastava</strong> and Jeffrey Heer<br />
<strong>IUI 2020</strong></li>
</ul>
<h2 id="workshops-and-challenges">Workshops and Challenges</h2>
<h3 id="organizer">Organizer</h3>
<ul>
<li><a href="https://facebookresearch.github.io/spoken_task_oriented_parsing/">Spoken Language Understanding Grand Challenge @ ICASSP 2023</a>.<br />
<strong>ICASSP 2023</strong></li>
</ul>
<h3 id="program-comittee">Program Comittee</h3>
<ul>
<li><a href="https://tl4nlp.github.io/">Transfer Learning 4 NLP @ NeurIPS 2022</a></li>
<li>EACL 2022</li>
<li><a href="https://sites.google.com/view/4thnlp4convai/home?authuser=0">NLP For Conversational AI @ ACL 2022</a></li>
<li>ACL 2022</li>
</ul>PapersRetrieve-and-Fill for Scenario-based Task-Oriented Semantic Parsing2021-09-22T00:00:00+00:002021-09-22T00:00:00+00:00akshatsh.github.io/publication<p><strong>Authors:</strong> <strong>Akshat Shrivastava</strong>, Shrey Desai, Anchit Gupta, Ali Elkahky, Aleksandr Livshits, Alexander Zotov, Ahmed Aly</p>
<p>Task-oriented semantic parsing models have achieved strong results in recent years, but unfortunately do not strike an appealing balance between model size, runtime latency, and cross-domain generalizability. We tackle this problem by introducing scenario-based semantic parsing: a variant of the original task which first requires disambiguating an utterance’s “scenario” (an intent-slot template with variable leaf spans) before generating its frame, complete with ontology and utterance tokens. This formulation enables us to isolate coarse-grained and fine-grained aspects of the task, each of which we solve with off-the-shelf neural modules, also optimizing for the axes outlined above. Concretely, we create a Retrieve-and-Fill (RAF) architecture comprised of (1) a retrieval module which ranks the best scenario given an utterance and (2) a filling module which imputes spans into the scenario to create the frame. Our model is modular, differentiable, interpretable, and allows us to garner extra supervision from scenarios. RAF achieves strong results in high-resource, low-resource, and multilingual settings, outperforming recent approaches by wide margins despite, using base pre-trained encoders, small sequence lengths, and parallel decoding.</p>
<p><strong>Pre-print</strong></p>
<p><a href="https://arxiv.org/abs/2202.00901">Arxiv Link</a></p>Authors: Akshat Shrivastava, Shrey Desai, Anchit Gupta, Ali Elkahky, Aleksandr Livshits, Alexander Zotov, Ahmed AlyRETRONLU Retrieval Augmented Task-Oriented Semantic Parsing2021-09-21T00:00:00+00:002021-09-21T00:00:00+00:00akshatsh.github.io/publication<p><strong>Authors:</strong> Vivek Gupta, <strong>Akshat Shrivastava</strong>, Adithya Sagar, Armen Aghajanyan, and Denis Savenkov</p>
<p>While large pre-trained language models accumulate a lot of knowledge in their parameters, it has been demonstrated that augmenting it with non-parametric retrieval-based memory has a number of benefits from accuracy improvements to data efficiency for knowledge-focused tasks, such as question answering. In this paper, we are applying retrieval-based modeling ideas to the problem of multi-domain task-oriented semantic parsing for conversational assistants. Our approach, RetroNLU, extends a sequence-to-sequence model architecture with a retrieval component, used to fetch existing similar examples and provide them as an additional input to the model. In particular, we analyze two settings, where we augment an input with (a) retrieved nearest neighbor utterances (utterance-nn), and (b) ground-truth semantic parses of nearest neighbor utterances (semparse-nn). Our technique outperforms the baseline method by 1.5% absolute macro-F1, especially at the low resource setting, matching the baseline model accuracy with only 40% of the data. Furthermore, we analyze the nearest neighbor retrieval component’s quality, model sensitivity and break down the performance for semantic parses of different utterance complexity.</p>
<p><a href="https://arxiv.org/abs/2109.10410">Arxiv Link</a></p>Authors: Vivek Gupta, Akshat Shrivastava, Adithya Sagar, Armen Aghajanyan, and Denis SavenkovAssessing Data Efficiency in Task-Oriented Semantic Parsing2021-07-10T00:00:00+00:002021-07-10T00:00:00+00:00akshatsh.github.io/publication<p><strong>Authors:</strong> Shrey Desai, <strong>Akshat Shrivastava</strong>, Justin Rill, Brian Moran, Safiyyah Saleem, Alexander Zotov, and Ahmed Aly</p>
<p>Data efficiency, despite being an attractive characteristic, is often challenging to measure and optimize for in task-oriented semantic parsing; unlike exact match, it can require both modeland domain-specific setups, which have, historically, varied widely across experiments. In our work, as a step towards providing a unified solution to data-efficiencyrelated questions, we introduce a four-stage protocol which gives an approximate measure of how much in-domain, “target” data a parser requires to achieve a certain quality bar. Specifically, our protocol consists of (1) sampling target subsets of different cardinalities, (2) fine-tuning parsers on each subset, (3) obtaining a smooth curve relating target subset (%) vs. exact match (%), and (4) referencing the curve to mine ad-hoc (target subset, exact match) points. We apply our protocol in two real-world case studies—model generalizability and intent complexity—illustrating its flexiblity and applicability to practitioners in taskoriented semantic parsing.
<a href="https://arxiv.org/abs/2107.04736">Arxiv Link</a></p>Authors: Shrey Desai, Akshat Shrivastava, Justin Rill, Brian Moran, Safiyyah Saleem, Alexander Zotov, and Ahmed AlyLatency-Aware Neural Architecture Search with Multi-Objective Bayesian Optimization2021-06-22T00:00:00+00:002021-06-22T00:00:00+00:00akshatsh.github.io/publication<p><strong>Authors:</strong> David Eriksson, Pierce I-Jen Chuang, Sam Daulton, Peng Xia, <strong>Akshat Shrivastava</strong>, Arun Babu, Shicong Zhao, Ahmed Aly, Ganesh Venkatesh, and Maximilian Balandat</p>
<p>When tuning the architecture and hyperparameters of large machine learning models for on-device deployment, it is desirable to understand the optimal trade-offs between on-device latency and model accuracy. In this work, we leverage recent methodological advances in Bayesian optimization over high-dimensional search spaces and multi-objective Bayesian optimization to efficiently explore these trade-offs for a production-scale on-device natural language understanding model at Facebook.</p>
<p><strong>ICML Workshop on Automated Machine Learning 2021</strong></p>
<p><a href="https://arxiv.org/abs/2106.11890">Arxiv Link</a></p>Authors: David Eriksson, Pierce I-Jen Chuang, Sam Daulton, Peng Xia, Akshat Shrivastava, Arun Babu, Shicong Zhao, Ahmed Aly, Ganesh Venkatesh, and Maximilian BalandatSpan Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing2021-04-15T00:00:00+00:002021-04-15T00:00:00+00:00akshatsh.github.io/publication<p><strong>Authors:</strong> <strong>Akshat Shrivastava</strong>, Pierce Chuang, Arun Babu, Shrey Desai, Abhinav Arora, Alexander Zotov, Ahmed Aly</p>
<p>An effective recipe for building seq2seq, non-autoregressive, task-oriented parsers to map utterances to semantic frames proceeds in three steps: encoding an utterance x, predicting a frame’s length |y|, and decoding a |y|-sized frame with utterance and ontology tokens. Though empirically strong, these models are typically bottle necked by length prediction, as even small inaccuracies change the syntactic and semantic characteristics of resulting frames. In our work, we propose span pointer networks, non-autoregressive parsers which shift the decoding task from text generation to span prediction; that is, when imputing utterance spans into frame slots, our model produces endpoints (e.g., [i, j]) as opposed to text (e.g., “6pm”). This natural quantization of the output space reduces the variability of gold frames, therefore improving length prediction and, ultimately, exact match. Furthermore, length prediction is now responsible for frame syntax and the decoder is responsible for frame semantics, resulting in a coarse-to-fine model. We evaluate our approach on several task-oriented semantic parsing datasets. Notably, we bridge the quality gap between non-autogressive and autoregressive parsers, achieving 87 EM on TOPv2 (Chen et al. 2020). Furthermore,due to our more consistent gold frames, we show strong improvements in model generalization in both cross-domain and cross-lingual transfer in low-resource settings. Finally, due to our diminished output vocabulary, we observe 70% reduction in latency and 83% reduction in memory at beam size 5 compared to prior non-autoregressive parsers.</p>
<p><strong>Pre-print</strong></p>
<p><a href="https://arxiv.org/abs/2104.07275">Arxiv Link</a></p>Authors: Akshat Shrivastava, Pierce Chuang, Arun Babu, Shrey Desai, Abhinav Arora, Alexander Zotov, Ahmed AlyLow-Resource Task-Oriented Semantic Parsing via Intrinsic Modeling2021-04-14T00:00:00+00:002021-04-14T00:00:00+00:00akshatsh.github.io/publication<p><strong>Authors:</strong> Shrey Desai, <strong>Akshat Shrivastava</strong>, Alexander Zotov, Ahmed Aly</p>
<p><em>Task-oriented semantic parsing models typically have high resource requirements: to support new ontologies (i.e., intents and slots), practitioners crowdsource thousands of samples for supervised fine-tuning. Partly, this is due to the structure of de facto copy-generate parsers; these models treat ontology labels as discrete entities, relying on parallel data to extrinsically derive their meaning. In our work, we instead exploit what we intrinsically know about ontology labels; for example, the fact that SL:TIME_ZONE has the categorical type “slot” and language-based span “time zone”. Using this motivation, we build our approach with offline and online stages. During preprocessing, for each ontology label, we extract its intrinsic properties into a component, and insert each component into an inventory as a cache of sorts. During training, we fine-tune a seq2seq, pre-trained transformer to map utterances and inventories to frames, parse trees comprised of utterance and ontology tokens. Our formulation encourages the model to consider ontology labels as a union of its intrinsic properties, therefore substantially bootstrapping learning in low-resource settings. Experiments show our model is highly sample efficient: using a low-resource benchmark derived from TOPv2, our inventory parser outperforms a copy-generate parser by +15 EM absolute (44% relative) when fine-tuning on 10 samples from an unseen domain.</em></p>
<p><strong>Pre-print</strong></p>
<p><a href="https://arxiv.org/abs/2104.07224">Arxiv Link</a></p>Authors: Shrey Desai, Akshat Shrivastava, Alexander Zotov, Ahmed AlyNon-Autoregressive Semantic Parsing for Compositional Task-Oriented Dialog2021-04-11T00:00:00+00:002021-04-11T00:00:00+00:00akshatsh.github.io/publication<p><strong>Authors:</strong> Arun Babu, <strong>Akshat Shrivastava</strong>, Armen Aghajanyan, Ahmed Aly, Angela Fan, Marjan Ghazvininejad</p>
<p><em>Semantic parsing using sequence-to-sequence models allows parsing of deeper representations compared to traditional word tagging based models. In spite of these advantages, widespread adoption of these models for real-time conversational use cases has been stymied by higher compute requirements and thus higher latency. In this work, we propose a non-autoregressive approach to predict semantic parse trees with an efficient seq2seq model architecture. By combining non-autoregressive prediction with convolutional neural networks, we achieve significant latency gains and parameter size reduction compared to traditional RNN models. Our novel architecture achieves up to an 81% reduction in latency on TOP dataset and retains competitive performance to non-pretrained models on three different semantic parsing datasets. Our code is available in <a href="https://github.com/facebookresearch/pytext">PyText</a></em></p>
<p><strong>To be presented at NAACL 2021</strong></p>
<p><a href="https://arxiv.org/abs/2104.04923">Arxiv Link</a>
| <a href="https://github.com/facebookresearch/pytext">Code</a></p>Authors: Arun Babu, Akshat Shrivastava, Armen Aghajanyan, Ahmed Aly, Angela Fan, Marjan GhazvininejadMuppet Massive Multi-task Representations with Pre-Finetuning2021-01-06T00:00:00+00:002021-01-06T00:00:00+00:00akshatsh.github.io/publication<p><strong>Authors:</strong> Armen Aghajanyan, Anchit Gupta, <strong>Akshat Shrivastava</strong>, Xilun Chen, Luke Zettlemoyer, Sonal Gupta</p>
<p><em>We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g.~RoBERTa) and generation models (e.g.~BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.</em></p>
<p><strong>Preprint</strong></p>
<p><a href="https://arxiv.org/abs/2101.11038">Arxiv Link</a></p>Authors: Armen Aghajanyan, Anchit Gupta, Akshat Shrivastava, Xilun Chen, Luke Zettlemoyer, Sonal GuptaConversational Semantic Parsing2020-09-28T00:00:00+00:002020-09-28T00:00:00+00:00akshatsh.github.io/publication<p><strong>Authors:</strong> Armen Aghajanyan, Jean Maillard, <strong>Akshat Shrivastava</strong>, Keith Diedrick, Mike Haeger, Haoran Li, Yashar Mehdad, Ves Stoyanov, Anuj Kumar, Mike Lewis, Sonal Gupta</p>
<p><em>The structured representation for semantic parsing in task-oriented assistant systems is geared towards simple understanding of one-turn queries. Due to the limitations of the representation, the session-based properties such as co-reference resolution and context carryover are processed downstream in a pipelined system. In this paper, we propose a semantic representation for such task-oriented conversational systems that can represent concepts such as co-reference and context carryover, enabling comprehensive understanding of queries in a session. We release a new session-based, compositional task-oriented parsing dataset of 20k sessions consisting of 60k utterances. Unlike Dialog State Tracking Challenges, the queries in the dataset have compositional forms. We propose a new family of Seq2Seq models for the session-based parsing above, which achieve better or comparable performance to the current state-of-the-art on ATIS, SNIPS, TOP and DSTC2. Notably, we improve the best known results on DSTC2 by up to 5 points for slot-carryover.</em></p>
<p><strong>Presented at EMNLP 2020</strong></p>
<p><a href="https://arxiv.org/abs/2009.13655">Arxiv Link</a></p>Authors: Armen Aghajanyan, Jean Maillard, Akshat Shrivastava, Keith Diedrick, Mike Haeger, Haoran Li, Yashar Mehdad, Ves Stoyanov, Anuj Kumar, Mike Lewis, Sonal Gupta