16 results

Journal articles 2026 KCODA

Direct Access for Conjunctive Queries with Negations

Florent Capelli, Nofar Carmeli, Oliver Irwin, Sylvain Salvati

Abstract

Given a conjunctive query \(Q\) and a database \(\mathbf{D}\), a direct access to the answers of \(Q\) over \(\mathbf{D}\) is the operation of returning, given an index \(j\), the \(j^{\mathsf{th}}\) answer for some order on its answers. While this problem is \(\#\mathsf{P}\)-hard in general with respect to combined complexity, many conjunctive queries have an underlying structure that allows for a direct access to their answers for some lexicographical ordering that takes polylogarithmic time in the size of the database after a polynomial time precomputation. Previous work has precisely characterised the tractable classes and given fine-grained lower bounds on the precomputation time needed depending on the structure of the query. In this paper, we generalise these tractability results to the case of signed conjunctive queries, that is, conjunctive queries that may contain negative atoms. Our technique is based on a class of circuits that can represent relational data. We first show that this class supports tractable direct access after a polynomial time preprocessing. We then give bounds on the size of the circuit needed to represent the answer set of signed conjunctive queries depending on their structure. Both results combined together allow us to prove the tractability of direct access for a large class of conjunctive queries. On the one hand, we recover the known tractable classes from the literature in the case of positive conjunctive queries. On the other hand, we generalise and unify known tractability results about negative conjunctive queries – that is, queries having only negated atoms. In particular, we show that the class of \(\beta\)-acyclic negative conjunctive queries and the class of bounded nest set width negative conjunctive queries admit tractable direct access.

Proceedings 2026 EXPAND

On the Complexity of Language Membership for Probabilistic Words

Antoine Amarilli, Mikaël Monet, Paul Raphaël, Sylvain Salvati

Abstract

35 pages including 1 title page, 15 pages of main text, 4 pages of reference, and appendix

Journal articles 2026

Corrections to “On the data complexity of consistent query answering over graph databases [Journal of Computer and System Sciences 88 (2017) 164–194]”

Pablo Barceló, Gaëlle Fontaine, Sylvain Salvati, Sophie Tison

Abstract

Applications of graph databases are prone to inconsistency due to interoperability issues. This raises the need for studying query answering over inconsistent graph databases in a simple but general framework. We follow the approach of consistent query answering (CQA), and study its data complexity over graph databases for conjunctive regular-path queries (CRPQs) and conjunctive regular-path constraints (CRPCs). We deal with subset, superset and symmetric-difference repairs. Without restrictions, CQA is undecidable for the semantics of superset- and symmetric-difference repairs, and _2^P-complete for subset-repairs. However, we identify restrictions on CRPCs and databases that lead to decidability, and even tractability of CQA.

Proceedings 2026

Gray Codes With Constant Delay and Constant Auxiliary Space

Antoine Amarilli, Claire David, Nadime Francis, Victor Marsault, Mikaël Monet, Yann Strozecki

Journal articles 2025 EQUUS

Approximating Queries on Probabilistic Graphs

Antoine Amarilli, Timothy van Bremen, Octave Gaspard, Kuldeep S. Meel

Abstract

Query evaluation over probabilistic databases is notoriously intractable – not only in combined complexity, but often in data complexity as well. This motivates the study of approximation algorithms, and particularly of combined FPRASes, with runtime polynomial in both the query and instance size. In this paper, we focus on tuple-independent probabilistic databases over binary signatures, i.e., probabilistic graphs, and study when we can devise combined FPRASes for probabilistic query evaluation. We settle the complexity of this problem for a variety of query and instance classes, by proving both approximability results and (conditional) inapproximability results doubled with (unconditional) DNNF provenance circuit size lower bounds. This allows us to deduce many corollaries of possible independent interest. For example, we show how the results of Arenas et al. on counting fixed-length strings accepted by an NFA imply the existence of an FPRAS for the two-terminal network reliability problem on directed acyclic graphs: this was an open problem until now. We also show that one cannot extend a recent result of van Bremen and Meel that gives a combined FPRAS for self-join-free conjunctive queries of bounded hypertree width on probabilistic databases: neither the bounded-hypertree-width condition nor the self-join-freeness hypothesis can be relaxed. We last show how our methods can give insights on the evaluation and approximability of regular path queries (RPQs) on probabilistic graphs in the data complexity perspective, showing in particular that some of them are (conditionally) inapproximable.

Preprints, Working Papers, … 2025

Cutwidth Bounds via Vertex Partitions

Antoine Amarilli, Benoît Groz

Abstract

We study the cutwidth measure on graphs and ways to bound the cutwidth of a graph by partitioning its vertices. We consider bounds expressed as a function of two quantities: on the one hand, the maximal cutwidth y of the subgraphs induced by the classes of the partition, and on the other hand, the cutwidth x of the quotient multigraph obtained by merging each class to a single vertex. We consider in particular the decomposition of directed graphs into strongly connected components (SCCs): in this case, y is the maximal cutwidth of an SCC, and x is the cutwidth of the directed acyclic condensation multigraph. We show that the cutwidth of a graph is always in O(x + y), specifically it can be upper bounded by 1.5x + y. We also show a lower bound justifying that the constant 1.5 cannot be improved in general.

Preprints, Working Papers, … 2025

Locality Testing for NFAs is PSPACE-complete

Antoine Amarilli, Mikaël Monet, Rémi de Pretto

Preprints, Working Papers, … 2025

Confluence of the Node-Domination and Edge-Domination Hypergraph Rewrite Rules

Antoine Amarilli, Mikaël Monet, Rémi de Pretto

Conference papers 2025

Shape Expressions with Inheritance

Iovka Boneva, Jose Emilio Labra Gayo, Eric Prud’hommeaux, Katherine Thornton, Andra Waagmeester

Abstract

We formally introduce an inheritance mechanism for the Shape Expressions language (ShEx). It is inspired by inheritance in object-oriented programming languages, and provides similar advantages such as reuse, modularity, and more flexible data modelling. Using an example, we explain the main features of the inheritance mechanism. We present its syntax and formal semantics. The semantics is an extension of the semantics of ShEx 2.1. It also directly yields a validation algorithm as an extension of the previous ShEx validation algorithms, while maintaining the same algorithmic complexity.

Conference papers 2025

Common Foundations for SHACL, ShEx, and PG-Schema

Shqiponja Ahmetaj, Iovka Boneva, Jan Hidders, Katja Hose, Maxime Jakubowski, Jose Emilio Labra Gayo, Wim Martens, Fabio Mogavero, Filip Murlak, Cem Okulmus, Axel Polleres, Ognjen Savković, Mantas Šimkus, Dominik Tomaszuk

Abstract

Graphs have emerged as a foundation for a variety of applications, including capturing factual knowledge, semantic data integration, social networks, and informing machine learning algorithms. Formalising properties of the data and ensuring data quality requires describing schemas of such graphs. Driven by diverse applications, the Semantic Web and database communities developed not only different graph data models-RDF and property graphs-but also different graph schema languages-SHACL, ShEx, and PG-Schema. Each language has its unique approach to defining constraints and validating graph data, leaving potential users in the dark about their commonalities and differences. In this paper, we provide concise formal definitions of the core components of these languages, employ a uniform framework to facilitate a comprehensive comparison between them, and identify a common set of functionalities, shedding light on both overlapping and distinctive features.

Preprints, Working Papers, … 2025

Vizitig: a pangenome and pantranscriptome explorer

Bastien Degardins, Charles Paperman, Camille Marchet

Abstract

Vizitig is the first platform for real-time exploration and querying of DNA and RNA sequence graphs across many samples, unifying visualization, metadata, and flexible search. It integrates raw and reference-based data, handles complex variation, and provides a human-readable query language with scalable graph loading. Vizitig enables fast, reproducible analysis in both pangenomics and pantranscriptomics.

Conference papers 2025 KCODA

A Simple Algorithm for Worst Case Optimal Join and Sampling

Florent Capelli, Oliver Irwin, Sylvain Salvati

Abstract

We present an elementary branch and bound algorithm with a simple analysis of why it achieves worstcase optimality for join queries on classes of databases defined respectively by cardinality or acyclic degree constraints. We then show that if one is given a reasonable way for recursively estimating upper bounds on the number of answers of the join queries, our algorithm can be turned into algorithm for uniformly sampling answers with expected running time Õ(UP/OUT) where UP is the upper bound, OUT is the actual number of answers and Õ(•) ignores polylogarithmic factors. Our approach recovers recent results on worstcase optimal join algorithm and sampling in a modular, clean and elementary way.

Conference papers 2025 EQUUS

Edge-Minimum Walk of Modular Length in Polynomial Time

Antoine Amarilli, Benoit Groz, Nicole Wein

Abstract

We study the problem of finding, in a directed graph, an st-walk of length r mod q which is edge-minimum, i.e., uses the smallest number of distinct edges. Despite the vast literature on paths and cycles with modularity constraints, to the best of our knowledge we are the first to study this problem. Our main result is a polynomial-time algorithm that solves this task when r and q are constants. We also show how our proof technique gives an algorithm to solve a generalization of the well-known Directed Steiner Network problem, in which connections between endpoint pairs are required to satisfy modularity constraints on their length. Our algorithm is polynomial when the number of endpoint pairs and the modularity constraints on the pairs are constants.

Conference papers 2025

Dynamic Membership for Regular Tree Languages

Antoine Amarilli, Corentin Barloy, Louis Jachiet, Charles Paperman

Abstract

We study the dynamic membership problem for regular tree languages under relabeling updates: we fix an alphabet Σ and a regular tree language L over Σ (expressed, e.g., as a tree automaton), we are given a tree T with labels in Σ, and we must maintain the information of whether the tree T belongs to L while handling relabeling updates that change the labels of individual nodes in T. Our first contribution is to show that this problem admits an O(log n / log log n) algorithm for any fixed regular tree language, improving over known O(log n) algorithms. This generalizes the known O(log n / log log n) upper bound over words, and it matches the lower bound of Ω(log n / log log n) from dynamic membership to some word languages and from the existential marked ancestor problem. Our second contribution is to introduce a class of regular languages, dubbed almost-commutative tree languages, and show that dynamic membership to such languages under relabeling updates can be decided in constant time per update. Almost-commutative languages generalize both commutative languages and finite languages: they are the analogue for trees of the ZG languages enjoying constant-time dynamic membership over words. Our main technical contribution is to show that this class is conditionally optimal when we assume that the alphabet features a neutral letter, i.e., a letter that has no effect on membership to the language. More precisely, we show that any regular tree language with a neutral letter which is not almost-commutative cannot be maintained in constant time under the assumption that the prefix-U1 problem from [Antoine Amarilli et al., 2021] also does not admit a constant-time algorithm.

Journal articles 2025 EQUUS

Resilience for Regular Path Queries: Towards a Complexity Classification

Antoine Amarilli, Wolfgang Gatterbauer, Neha Makhija, Mikaël Monet

Abstract

The resilience problem for a query and an input set or bag database is to compute the minimum number of facts to remove from the database to make the query false. In this paper, we study how to compute the resilience of Regular Path Queries (RPQs) over graph databases. Our goal is to characterize the regular languages \(L\) for which it is tractable to compute the resilience of the existentially-quantified RPQ built from \(L\). We show that computing the resilience in this sense is tractable (even in combined complexity) for all RPQs defined from so-called local languages. By contrast, we show hardness in data complexity for RPQs defined from the following language classes (after reducing the languages to eliminate redundant words): all finite languages featuring a word containing a repeated letter, and all languages featuring a specific kind of counterexample to being local (which we call four-legged languages). The latter include in particular all languages that are not star-free. Our results also imply hardness for all non-local languages with a so-called neutral letter. We also highlight some remaining obstacles towards a full dichotomy. In particular, for the RPQ \(abc|be\), resilience is tractable but the only PTIME algorithm that we know uses submodular function optimization.

Conference papers 2025

Linear Time Subsequence and Supersequence Regex Matching

Antoine Amarilli, Florin Manea, Tina Ringleb, Markus Schmid

Abstract

It is well-known that checking whether a given string w matches a given regular expression r can be done in quadratic time O(|w|⋅ |r|) and that this cannot be improved to a truly subquadratic running time of O((|w|⋅ |r|)^{1-ε}) assuming the strong exponential time hypothesis (SETH). We study a different matching paradigm where we ask instead whether w has a subsequence that matches r, and show that regex matching in this sense can be solved in linear time O(|w| + |r|). Further, the same holds if we ask for a supersequence. We show that the quantitative variants where we want to compute a longest or shortest subsequence or supersequence of w that matches r can be solved in O(|w|⋅ |r|), i. e., asymptotically no worse than classical regex matching; and we show that O(|w| + |r|) is conditionally not possible for these problems. We also investigate these questions with respect to other natural string relations like the infix, prefix, left-extension or extension relation instead of the subsequence and supersequence relation. We further study the complexity of the universal problem where we ask if all subsequences (or supersequences, infixes, prefixes, left-extensions or extensions) of an input string satisfy a given regular expression.

No publications match the selected filters.