|

Nest AI Research - Security & Language Models

Posted by Riino on Thursday, August 24, 2023

Nest AI Research - Security & Language Models

Nest AI Research - Security & Language Models

Nest AI Research - Security & Language Models

Overview

LM (languages models) are considered to firstly obtain the capabilities of interfecing, compared with other deep learning methods. LMs are trying to select most suitable words based on the previous given informations, which is called ‘completing’, regarding to the technique using ‘prompt’ to align all the downstream tasks into completing sentences.

Therefore, two major types of training data are used:

Examples of conversations. Where some words/sentences are masked so that supervised learning can be conducted.

Ranked answers or conversations. This will help LMs to improve the performance. And this is related with RLHF procedure.

Overall, LMs are the most complex systems designed, and there are lots of modified methods based on this structure.

Key Risks

My personal selection of risk for LLM base models will be:

Hallucination

Jailbreaking

Membership Inference Attack

While you may know the ‘OWASP LLM Top10’, which consider the LLM mode, LLM-based tools (agents) and ML/AI platforms as an intergrated system.

Current Researches

Selected Works

Model Assessment

Discovering Language Model Behaviors with Model-Written Evaluations

As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is...

https://arxiv.org/abs/2212.09251

Foveate, Attribute, and Rationalize: Towards Physically Safe and...

Users' physical safety is an increasing concern as the market for intelligent systems continues to grow, where unconstrained systems may recommend users dangerous actions that can lead to serious...

https://arxiv.org/abs/2212.09667

Discovering Language Model Behaviors with Model-Written Evaluations

As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is...

https://arxiv.org/abs/2212.09251

Trustworthy LLMs: a Survey and Guideline for Evaluating Large...

Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world...

https://arxiv.org/abs/2308.05374

Digital Signatrue

The Science of Detecting LLM-Generated Texts

The emergence of large language models (LLMs) has resulted in the production of LLM-generated texts that is highly sophisticated and almost indistinguishable from texts written by humans. However,...

https://arxiv.org/abs/2303.07205

Red-teaming

Universal and Transferable Adversarial Attacks on Aligned Language Models

Because "out-of-the-box" large language models are capable of generating a great deal of objectionable content, recent work has focused on aligning these models in an attempt to prevent...

https://arxiv.org/abs/2307.15043

Use of LLMs for Illicit Purposes: Threats, Prevention Measures,...

Spurred by the recent rapid increase in the development and distribution of large language models (LLMs) across industry and academia, much recent work has drawn attention to safety- and...

https://arxiv.org/abs/2308.12833

Membership Inference Attacks against Language Models via...

Membership Inference attacks (MIAs) aim to predict whether a data sample was present in the training data of a machine learning model or not, and are widely used for assessing the privacy risks of...

https://arxiv.org/abs/2305.18462

Hallucination v.s. Factuality

notion image

LLM Calibration and Automatic Hallucination Detection via Pareto...

Large language models (LLMs) have demonstrated remarkable capabilities out of box for a wide range of applications, yet accuracy still remains a major growth area, especially in mission-critical...

https://arxiv.org/abs/2306.16564

Survey on Factuality in Large Language Models: Knowledge,...

This survey addresses the crucial issue of factuality in Large Language Models (LLMs). As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital....

https://arxiv.org/abs/2310.07521

RAG

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their...

https://arxiv.org/abs/2005.11401

ACL2023:

CaPE: Contrastive Parameter Ensembling for Reducing Hallucination...

Hallucination is a known issue for neural abstractive summarization models. Recent work suggests that the degree of hallucination may depend on errors in the training data. In this work, we...

https://arxiv.org/abs/2110.07166

RHO ($ρ$): Reducing Hallucination in Open-domain Dialogues with...

Dialogue systems can leverage large pre-trained language models and knowledge to generate fluent and informative responses. However, these models are still prone to produce hallucinated responses...

https://arxiv.org/abs/2212.01588

MPC

MPCFormer: fast, performant and private Transformer inference with MPC

Enabling private inference is crucial for many cloud inference services that are based on Transformer models. However, existing private inference solutions can increase the inference latency by...

https://arxiv.org/abs/2211.01452