Skip to Main Content

Computer Science

Your first stop for research: a guide to resources for Computer Science

What are journals and why are they important?

A journal is a periodical publication in which scholarship relating to a particular academic discipline is published. Content typically takes the form of articles presenting original research, review articles, or book reviews.

Journals are important in academic research for a number of reasons:

  • they are the main forum by which research is published
  • they can explore a narrow topic in depth
  • they contain current or recent research findings
  • they are often peer-reviewed so that research findings are checked by other subject experts.

How to find journal articles

image of the library search box

1. Search the Library Website for your keywords. The Library Website searches many of the databases the University Library subscribes to and is the best place to start your research.  

2. If you don't find the results you need from the Library Website you could search the databases individually. This page shows some of the databases likely to be most useful for the School of Computing, but you can find the full list here on the databases page.

3. Do you already know the name of a journal?  Type the title of the journal in Electronic Journals A-Z: it will show you if we have full-text access to the e-journal.

Library database collections

The University Library subscribes to a selection of database collections - the majority are research databases containing journal articles, but we also have ebook collections, film, tv & newspaper archives, and other specialist information.

Research databases contain:

Abstracts (short descriptions of what an article or paper is about)

Full journal articles (if they are part of the subscription package)

Conference papers on specific subjects

BrowZine

colourful curved lines coming out of an open book

Search e-journals

Browzine allows you to create your own online bookshelf from our collection of journals.

It's available through the 'Find' menu on the Library website - select 'Browse electronic journals' from the drop-down menu and start building your bookshelf.

New articles

A good way of keeping up-to-date with current research is to subscribe to alerts from your favourite journals.

JournalTOCs is a free service which allows you to set up alerts to any new articles published in Computer Science related journals.

Useful Databases for Computing & Games

The research databases listed below contain journal articles, abstracts and conference papers that will be useful for Computing & Games.  

Use the search box on the library website to search across a selection of databases or access them individually by clicking on the title links and, if asked, entering your University email and password.  

Communications of the ACM

New articles published in the Communications of the ACM.

  • GenDP: A Framework of Dynamic Programming Acceleration for Genome Sequencing AnalysisThis link opens in a new windowApr 29, 2025

    Abstract

    Genomics is playing an important role in transforming healthcare. Genetic data, however, is being produced at a rate that far outpaces Moore’s Law. Many efforts have been made to accelerate genomics kernels on modern commodity hardware, such as CPUs and GPUs, as well as custom accelerators (ASICs) for specific genomics kernels. While ASICs provide higher performance and energy efficiency than general-purpose hardware, they incur a high hardware-design cost. Moreover, to extract the best performance, ASICs tend to have significantly different architectures for different kernels. The divergence of ASIC designs makes it difficult to run commonly used modern sequencing analysis pipelines due to software integration and programming challenges.

    With the observation that many genomics kernels are dominated by dynamic programming (DP) algorithms, this paper presents GenDP, a framework of dynamic programming acceleration including DPAx, a DP accelerator, and DPMap, a graph-partitioning algorithm that maps DP objective functions to the accelerator. DPAx supports DP kernels with various dependency patterns, such as 1D and 2D DP tables and long-range dependencies in the graph structure. DPAx also supports different DP objective functions and precisions required for genomics applications. GenDP is evaluated on genomics kernels in both short-read and long-read analysis pipelines, achieving 157.8×throughput/mm2 over GPU baselines and 132.0×throughput/mm2 over CPU baselines.

    1. Introduction

    Genome sequencing, a key component of precision health, is necessary for early detection of cancer, autism, infectious diseases (such as COVID-19), and genetic diseases. With the innovation in genome sequencing technologies over the past decade, sequencing data is being generated more rapidly and at a lower cost, increasing at a rate that far outpaces Moore’s Law. The total amount of sequencing data has been doubling approximately every seven months.20 The cost to sequence a human genome has dropped from $100 million at the beginning of this century to less than $1,000 nowadays.3

    This large volume of sequencing data poses significant computational challenges and requires novel computing solutions that can keep pace. Recent works explore architecture-aware optimizations on commodity hardware, such as leveraging SIMD hardware on CPUs10,24 and thread-level parallelism on GPUs.1,17 Custom accelerators, however, achieve much better performance and are more area and power efficient than CPUs and GPUs.2,6,23,25 These accelerators gain orders of magnitude speed-ups over general-purpose hardware, but at a high cost of hardware design. Specific genomics “kernels” (algorithms) do not have a market large enough to justify a custom chip. This makes it difficult to design an accelerator for the particular implementation of a single kernel, since the state-of-the-art implementation may change significantly over the next few years. For example, Smith-Waterman (SW), the basic approximate string-matching algorithm, was optimized from the original SW to a banded SW, and further to an adaptive banded SW and a wavefront version. Therefore, the high cost for designing custom accelerators and frequent kernel developments motivate a generic domain-specific accelerator for genome sequencing analysis.

    In the commonly used genome sequencing pipelines, dynamic programming (DP) algorithms are widely used, including read alignment and variant calling in reference-guided alignment, layout and polishing in de novo assembly, as well as abundance estimation in metagenomics classification.22 Matrix multiplications are the heart of machine learning (ML) applications, which motivates the design of tensor processing units (TPU). Similarly, DP algorithms are adopted by many genomics kernels and account for large amounts of time in mainstream sequencing pipelines, which provides the opportunity for a dynamic programming accelerator that supports existing and future DP kernels.

    Dynamic programming simplifies a complicated problem by breaking it down into sub-problems which can be solved recursively. However, accelerating a general-purpose DP algorithm comes with several challenges. First, DP kernels in common sequencing pipelines have different dependency patterns, including both 1-Dimension and 2-Dimension DP tables. Some kernels have long-range dependencies in the graph structure, where cells in the DP table not only depend on the neighboring cells, but also depend on cells far away. Second, DP kernels have different objective functions which include multiple operators. For instance, approximate string matching, an algorithm applied in DNA, RNA, and protein sequence alignment, has three modes: local (Smith-Waterman), global (Needleman-Wunsch), and semi-global alignment (overlap), as well as three methods for scoring insertions and deletions: linear, affine, and convex. Each mode or method above requires a unique objective function. Third, DP kernels have different precision requirements. It is challenging to support multiple precision arithmetic while neither losing efficiency for low-precision computation nor compromising accuracy for high-precision computation.

    To address these challenges, we propose GenDP, a framework of dynamic programming acceleration for genome sequencing analysis, which supports multiple DP kernels. First, we present DPAx, a DP accelerator capable of solving multiple dependency patterns by providing flexible interconnections between processing elements (PEs) in the systolic array. The systolic array helps exploit the wavefront parallelism in the DP table and provides better spatial locality for DP dataflow. DPAx decouples the control and compute instructions in the systolic array. Second, we present DPMap, a graph-partitioning algorithm which maps the data-flow graph of the objective function to compute units in the DPAx accelerator. DPAx supports different objective functions and multiple precision arithmetic by programmable compute units.

    We evaluate the GenDP framework on four DP kernels: Banded Smith-Waterman (BSW), Chain, Pairwise Hidden Markov Model (PairHMM), and Partial Order Alignment (POA).22 We also demonstrate generality of the proposed framework by extending to other dynamic programming algorithms, such as Dynamic Time Warping (DTW), which is commonly used for speech detection, as well as the Bellman-Ford (BF) algorithm for shortest-path search in robotic motion-planning tasks. In summary, this paper makes the following contributions:

    • We propose GenDP, a general-purpose acceleration framework for dynamic programming algorithms.

    • We design DPAx, an accelerator with programmable compute units, specialized dataflow, and flexible interconnections, supporting multiple dependency patterns, objective functions, and multi-precision arithmetic.

    • We describe DPMap, a graph-partitioning algorithm, to map the data-flow graph of DP objective functions to the compute units in DPAx.

    • GenDP achieves 157.8× throughput per unit area and 15.1×throughput/Watt compared to GPU, and 132.0× throughput per unit area over CPU baselines.

    2. Background

    Genome sequencing starts with raw data from the sequencer. The raw signals are interpreted to derive reads (short sequences of base pairs). This process is called basecalling. Next-generation sequencing (NGS) technologies produce short reads with 100150 base pairs (bp), while third-generation technologies produce much longer reads (> 10,000 bps). After obtaining reads from raw data, there are two important analysis pipelines: reference-guided assembly and ”de novo” assembly (without using a reference genome).

    2.1 Common genomics pipelines.

    In reference-guided assembly, the sample genome is reconstructed by aligning reads to an existing reference genome. Read alignment can be abstracted to an approximate string-matching problem, where dynamic programming is used to estimate the pairwise similarity between the read and reference sequence.24 After alignments, small variants (mutations) still exist in aligned reads. A Hidden Markov Model (HMM) or ML model is then applied to detect such mutations, in a step known as variant calling.15

    If there is no reference genome available for alignment, the genome sequence needs to be constructed with reads from scratch, which is referred to as “de novo” assembly. Reads with overlapping regions can be chained to build an overlap graph and are then further extended into larger contiguous regions. Finally, assembly errors are corrected in a graph-based dynamic programming polishing step.

    In addition to the two analysis pipelines above, metagenomics classification is used for real-time pathogen detection and microbial abundance estimation.12 Metagenomics classification aligns input microbial reads to a reference pan-genome (consisting of different species) and then estimates the proportion of different microbes in the sample.

    2.2 Dynamic programming.

    Dynamic programming simplifies a problem by breaking it down into subproblems. Following the Bellman equation which describes the objective function, the sub-problems can be solved recursively from the initial conditions. Longest common subsequence (LCS) is a classic DP algorithm that involves looking for the LCS of two known sequences Xm={x0,x1...xm1} and Yn={y0,y1...yn1}. First, looking for the LCS between Xm and Yn can be simplified by looking for LCSs between Xm and Yn1, as well as Xm1 and Yn, which can be further converted to compute the LCSs between Xm1 and Yn1. If we define c[i,j] to be the length of an LCS between the sequence Xi and Yj, the objective function can be represented by Equation 1:

    c [ i , j ] = 0 i f i = 0 o r j = 0 c [ i 1 , j 1 ] + 1 i f i , j > 0 a n d x i 1 = y j 1 m a x ( c [ i , j 1 ] , c [ i 1 , j ] ) i f i , j > 0 a n d x i 1 y j 1

    Second, a DP table can be constructed based on the sequence Xm and Yn to memorize the sub-problem results, as shown in Figure 2. c[i,j] is calculated based on its upper, left, and diagonal neighbors c[i1,j], c[i,j1], and c[i1,j1]. The first row and first column of the DP table are filled with 0. Based on this initial condition, the cells in the whole DP table can be filled out. Finally, the largest value in the table is the length of longest common subsequence and the corresponding subsequence can be found by the trace back stage, as shown in orange blocks in Figure 2.

    Figure 1.  Black cells in BSW show the bands that limit the computing regions. Grey cells in all figures show previously computed elements and green ones on the wavefront are cells being computed in parallel. White cells show the untouched cells. Long arrows pointing to green cells show the direction that cells are being computed in different processing units in parallel. Short arrows pointing to blue circles show the dependency patterns. Cells in POA may also include long dependencies from rows other than the previous row shown by orange arrows.
    Figure 2.  DP table for longest common subsequence.

    2.3 DP kernels in genomics pipelines.

    We introduce four important and time-consuming DP kernels from commonly used genomics pipelines, as shown in Figure 1. BSW is applied in read alignment, and variants of BSW are also used for RNA and protein alignment. PairHMM is used in post-alignment variant calling. POA is applied in the polishing step of assembly. Chain is used in both alignment and assembly of long read sequencing, as well as metagenomics classification. These four kernels spend 31%, 70%, 47%, and 75% of time in corresponding sequencing pipeline stages, respectively.22 The details of these algorithms are explained as follows:

    Banded Smith-Waterman is the banded version of the Smith-Waterman19 algorithm, which estimates the pairwise similarity between the query and reference sequences. The similarity score for a given DNA sequence is typically computed with affine-gap penalties, identifying short insertions and deletions in pairwise alignments. The objective function is shown in Figure 1a, which computes three matrices H, E, and F, corresponding to three edit types: match, insertion, and deletion. S is a similarity score between the base X(i) and Y(j). H(i,j) refers to the similarity score for the substring X(0,i) and Y(0,j). The banded version of Smith-Waterman is applied with a maximum of w insertions or deletions, illustrated as the region between black cells in Figure 1a. BSW can be computed using 8-bit or 16-bit integer arithmetic depending on the sequence length.

    Pairwise Hidden Markov Model aligns reads to candidate haplotypes identified by the De-Bruijn graph traversal. The most likely haplotype supported by the reads is identified from the pairwise alignment, which is performed by a HMM. The likelihood score is computed by the formula shown in Figure 1b, where fM, fI, and fD represent match, insertion, and deletion probabilities for aligning read substring X(0,i) to haplotype substring Y(0,j). The weights α are different transition and emission parameters of the HMM. ρ is the prior probability of emitting bases X(i) and Y(j). The computation in PairHMM uses floating-point arithmetic.25

    Partial Order Alignment: In the assembly polishing step, multiple read sequences are used to construct a partial-order graph and the consensus sequence is then generated from the graph. Each unaligned sequence is aligned to the existing graph, as shown in Figure 1c. The nodes in the partial-order graph represent bases in the read sequence, and the weighted edges denote the times that the edges appear in different reads. Each cell depends not only on the upper and diagonal cells in the previous row but also on earlier rows if there is an edge connecting that row with the current row in the graph. The objective function is similar to that used in BSW.

    Chain: Given a set of seed pairs (anchors) shared between a pair of reads, Chain aims to group a set of collinear seed pairs into a single overlap region, as shown in Figure 1d(i). In the 1-Dimension DP table (Figure 1d(ii)), each anchor is compared with N previous anchors (default setting N=25) to determine the best parent. However, the dependency between neighboring anchors poses difficulties for parallelism. The reordered Chain algorithm8 compares each anchor with N subsequent anchors and updates the scores each time for them (Figure 1d(iii)).

    Table 1 summarizes the characteristics of four DP kernels above, including the dimension and size of DP tables, dependency patterns and arithmetic precision. BSW and PairHMM are used for short read pipelines, while POA and Chain are used in long read pipelines with larger DP tables. The first three kernels have 2-Dimension DP tables, whereas Chain has a 1-Dimension DP table. These kernels also have different precision requirements.

    Table 1.  Characteristics of DP kernels.
    Kernels Dimension Dependency Precision
    BSW 2D Table 100×60 Last 2 Wavefronts 8-bit/16-bit Integer
    PairHMM 2D Table 100×60 Last 2 Wavefronts Floating-point
    POA 2D Table 1000×500

    Graph structure

    Long-range dependency

    32-bit Integer
    Chain 1D Table 20000 Last N(25) Anchors

    32-bit Integer

    Floating-point

    3. GenDP Framework

    Figure 3 demonstrates the structure of the GenDP framework. For a new DP kernel, we analyze its inter-cell dependency pattern and the intra-cell objective function, as shown in the top and bottom blocks, respectively. First, the inter-cell dependency patterns are determined by the recursion rule in the DP kernel. Based on the dependency pattern, we configure the PE interconnection and generate the control instructions. Second, the intra-cell data-flow graph for the objective function is mapped to compute units based on the DPMap algorithm. The compute instructions are generated based on the mapping results.

    Figure 3.  GenDP framework.

    3.1 Inter-cell dependency pattern supports.

    An overview of the DPAx architecture is described in Figure 4, including 16 integer PE arrays and one floating-point PE array. Each PE array contains four PEs connected as a 1-Dimension systolic array, in which each PE can receive data from the previous PE and pass data to the next one. The interconnections of the 16 integer PE arrays are configured based on the dependency pattern of the DP kernel. The last PE in the PE array can either be connected to the first PE in the next PE array or to the data buffer directly. The 16 integer PE arrays can be concatenated and make up a large systolic array consisting of 64 PEs. Figures 5b and 5d show PE arrays of size 4 and 8 respectively.

    Figure 4.  Overview of the DPAx architecture.
    Figure 5.  Example for DP tables and PE interconnections.

    For kernels with a 2D DP Table, cells within a row are executed in the same PE. For example, rows with different colors in Figure 5 (a) are executed in PEs with corresponding colors in Figure 5 (b). Each element in the target sequence is stored in a PE statically, while elements in the query sequence are streamed through PEs in the same array. Each cell depends on results from three neighbors, the result from the left cell in the same row is stored inside the PE, while results from upper and diagonal cells in the previous row are transferred from the previous PE. First-in-first-out (FIFO) buffers connect the last and the first PEs in the array, from which the first PE (the first row in the next four-row group) acquires the dependent data from the last PE (the last row in the current four-row group). PE array size determines how many rows can be executed in parallel.

    For kernels with a 1D DP Table, cells are mapped to a large PE array as shown in Figure 5 (d). When multiple small PE arrays are connected together to form a large PE array, only the FIFO in the first PE array is used. For example, in the first time step, cells #1-8 in Figure 5 (c) are mapped to PEs as shown in Figure 5 (d). Cells #1-8 all depend on cell #0. Therefore, the value of cell #0 is loaded from the FIFO to each PE sequentially. During the next time step, cells #2-8 move forward to their neighboring PEs. Cell #9 is sent from the input buffer to the first PE, while cell #1 is moved out from the last PE. Meanwhile, cell #1 is loaded from the FIFO to each PE because cells #2-9 all depend on cell #1.

    Long-range dependencies in the graph structure are supported by scratchpad memories (SPM) inside each PE. In a 2D DP table, usually only the result of the left cell is stored in the PE. However, if a cell depends on other cells in the same row but far away, all the candidate dependent cells need to be stored in the PE. In kernels with long-range dependencies, the result for each cell is not only stored in registers for reuse by the next cell but also stored in SPM for potential reuse by later cells.

    3.2 Intra-cell objective function mapping.

    To support DP objective functions efficiently in the PE, we design a multi-level ALU array in the compute unit. This multi-level ALU array is consistent with common compute patterns in genomics DP kernels and reduces the register access pressure as well. The computations in the objective function are represented by data-flow graphs. DPMap partitions the data-flow graph into multiple subgraphs which can be further mapped to the ALU arrays in compute units. The partition rules will be illustrated in Section 5.

    4. DPAx Architecture

    An overview of the DPAx architecture is shown in Figure 4 and discussed in the previous section. Figure 6 shows the architecture of the PE array and PE. The systolic array architecture simplifies control by allowing data flow between neighboring PEs. However, the systolic data path alone cannot satisfy the requirements of various DP kernels. For example, POA has long-range dependencies and its dependency pattern is determined by the graph structure. The movement for such dependency requires branch instructions. Therefore, DPAx decouples the computation and control architecture to provide flexible data movements similar to decoupled access execute architectures.18 Meanwhile, the parallelism and the massive reduction tree pattern observed in genomics DP kernels (Section 4.2) motivate the VLIW architecture.

    Figure 6.  PE Array architecture.

    The PE array consists of the input and output data buffers, control instruction buffer, decoder, FIFO buffer, and four PEs. PEs are connected as a systolic array and data can be passed from one PE to the next. The FIFO connects the last and first PEs. The first PE is connected to the input data buffer to receive the input sequences. The last PE stores the results to the output data buffer. The last PE in the PE array is also equipped with a dedicated port connected to the first PE in the next PE array to build a larger PE group. In Figure 6, blue solid and orange dotted lines show the data and control flow respectively.

    4.1 Processing element.

    Each PE is capable of running a control and a compute thread in parallel. Control and compute instructions are stored in two instruction buffers and decoded separately. Each PE contains a register file (RF) and a SPM that store the short- and long-range dependencies respectively. Load and store ports are connected to the previous and next PEs. The first and last PEs are also connected to the FIFO and input/output data buffers.

    Each PE is a two-way VLIW compute unit array that can execute two independent compute instructions in parallel to exploit the instruction-level parallelism (ILP). Every PE contains two 32-bit compute units which execute the VLIW instructions. Each compute unit (CU) can either execute operations on 32-bit or four concurrent 8-bit groups of operands as a SIMD unit to make use of data-level parallelism (DLP). The SIMD unit improves the performance of low-precision kernels—for example, BSW, where four DP tables are mapped to four SIMD lanes. The floating-point PE array and PE architecture is similar to the integer one, but only supports 32-bit FP operands.

    4.2 Compute unit design choice.

    We observe that genomics DP kernels have a common reduction tree pattern as shown in Figure 7 (a) and (b). Thus, we propose a reduction tree architecture for the CU. The outputs of the first-level ALUs are used as inputs to the next-level ALUs. The CU also contains a multiplication module for the weight calculation in the chain kernel. Since multiplication increases the length of the CU critical path, we design it as a separate unit from the ALU reduction tree.

    Figure 7.  Reduction Tree Pattern in (a) BSW and (b) PairHMM kernels. (c) (d) (e) Compute Unit design choices. GenDP uses the two-level reduction tree.

    Figure 7 (c)(d)(e) shows three possible choices for the ALU reduction tree. Compared to a one-level reduction tree, the two-level design requires fewer register file accesses and has better balances of critical path and area. Further, a three-level reduction tree decreases the ALU utilization when there are not enough patterns to map to the CU (Section 5).

    4.3 Execution model and GenDP ISA.

    We adopt the following execution model for GenDP. Instructions are preloaded to the accelerator before starting a DP kernel. Each PE array runs one thread of execution, controlling the data movement between data buffers and PEs, as well as the start of the execution for each PE. Upon receiving the start flag from the PE array, each PE runs two threads of execution: control and compute. The control thread manages data movement between the SPM, register file, and the systolic data flow between neighboring PEs. This thread also controls the start of the compute thread. The compute thread executes a compute instruction by decoding instructions, loading the data from the register file, executing computations in the CU array, and finally writing results back to the register file.

    The control ISA, shown in Table 2, is applied in both the PE array and PEs. Arithmetic instructions manipulate the address registers within the decoders. mv instructions either load immediate values or move data between memory components. branch instruction enables the loop iteration. set instructions control the start of execution for a subsidiary component (e.g. PE arrays control PEs and PEs control CUs). Figure 8 shows an example for data movement, where two control instructions (one in each PE) are executed. PE[i-1] moves data from 0x00ff in the register file to its out port, while PE[i] moves data from its in port to 0x00ff in the scratchpad memory. The delay for movements in both PEs are considered into the critical path and the movement can be done in one cycle. The control instructions are generated manually in this work.

    Table 2.  Control instruction set architecture.
    Type Instruction Assembly
    Arithmetic add add rd rs1 rs2
    addi addi rd rs1 #imm
    Move li li [dest addr] #imm
    mv mv [dest addr] [src addr]
    Branch beq/bne/bge/blt branch rs1 rs2 offset
    Other set set PC
    no-op no operation
    halt halt
    Figure 8.  Example for data movement.

    The two-way VLIW compute instructions are executed by two compute units, each of them containing three operations (for three ALUs of the two-level reduction tree in Figure 7 (d)) and six operands (four for the top left ALU and two for the right one). Operations in the compute instructions are listed in Table 3. Compute instructions are generated by DPMap algorithm in Section 5.

    Table 3.  Compute instruction operation.
    Operation Action
    Addition o u t = i n [ 0 ] + i n [ 1 ]
    Subtraction o u t = i n [ 0 ] i n [ 1 ]
    Multiplication o u t = i n [ 0 ] × i n [ 1 ]
    Carry o u t = c a r r y ( i n [ 0 ] , i n [ 1 ] )
    Borrow out=in[0]<in[1] ? 1 : 0
    Maximum o u t = m a x ( i n [ 0 ] , i n [ 1 ] )
    Minimum o u t = m i n ( i n [ 0 ] , i n [ 1 ] )
    Left-shift 16-bit o u t = i n [ 0 ] 16
    Right-shift 16-bit o u t = i n [ 0 ] 16
    Copy o u t = i n [ 0 ]
    Match Score o u t = s c o r e t a b l e ( i n [ 0 ] , i n [ 1 ] )
    Log2 LUT o u t = l o g 2 ( i n [ 0 ] ) 1
    Log_sum LUT o u t = l o g _ s u m ( i n [ 0 ] )
    Comparison > out=in[0]>in[1] ? in[2] : in[3]
    Comparison == out=in[0]==in[1] ? in[2] : in[3]
    No-op Invalid
    Halt Stop Computation

    5. DPMap Algorithm

    The DP objective function is represented as a data-flow graph (DFG). The DPMap algorithm generates compute instructions by mapping the DFG to compute units in the PE. In the DFG, a node represents an operator, while an edge shows the dependency between operators. The DFG has |V| nodes V={v0...v|V|1} and |E| edges E={e0...e|E|1}. In edge ei=(vm,vn), the operator in node vn takes the result of vm as an operand. We define node vm as the parent of node vn, and vn as the child of vm. Figure 9(a) shows the DFG of the BSW kernel.

    Example for the DPMap algorithm in BSW.
    Figure 9.  Example for the DPMap algorithm in BSW. In (b) partitioning, input edges of node comp are removed and a comp node is replicated to two comp nodes, because their children both have another parent and contain commutative operations. In (c) seeding, nodes with two parents or multiple children are located and grouped into subgraphs with their parents. In (d) refinement, the remaining two comp nodes are grouped into two subgraphs in the end.

    DPMap breaks the entire graph into subgraphs that contain either one multiplication or three ALU nodes (Figure 7(d)). The edges within the subgraphs represent the data movements within the compute units (CU). The edges between subgraphs represent accesses to the register file and are removed by DPMap in three steps. First, partitioning extracts nodes that will be mapped to four-input ALUs and multipliers, because a CU supports at most one such operation. Second, seeding looks for nodes that could be mapped to the second level of the ALU reduction tree. Nodes with more than one parent or more than one child are selected as seeds. The seed and its parents are mapped to a CU together. After seeding, the remaining nodes have a single parent or a single child. Third, refinement maps every two remaining nodes to the two-level ALU tree in a CU. Figure 9 shows an example of the DPMap algorithm. Four subfigures represent the original graph and the three steps in DPMap separately. Dashed blocks represent final subgraphs.

    Partitioning: Algorithm 1 breaks both input and output edges connected to nodes of four-input ALUs and multipliers. All parent and child edges of the multiplication nodes are removed (lines 2-4). DPMap also removes the parent edges of four-input operations (line 6). For a four-input node that has two children, we replicate it if the operations of its children are commutative (except Subtraction) in order to decrease register file accesses (lines 8-14). After partitioning, all nodes have at most two parents.

    Seeding: In Algorithm 2, we look for nodes that are suitable for the second level of the ALU reduction tree and name them seeds. Nodes that have two parents are located to fit the structure of the ALU reduction tree (line 2). The output edges of seeds are removed because the output of this operator will be stored to the register file (line 3). In addition, since input operands of the seed‘s parents must be fetched from the register file, DPMap also removes the input edges of the seed‘s parent nodes (lines 4-7). Finally, we remove the output edges of all nodes with more than one child as its outputs have to be stored in the register file (lines 9-11). At this point, all nodes have at most one parent or one child.

    Refinement: Algorithm 3 traverses the DFG in reverse order. If the node has a grandparent (line 4), the edge connecting its parent and grandparent is removed to group every two nodes (line 5). In the end, all subgraphs can be mapped to compute units in the PE.

    6. Evaluation

    We synthesize the DPAx accelerator using the Synopsys Design Compiler in a TSMC 28nm process. DPAx consumes 5.4mm2 in area and 3.6W of power. We use a cycle-accurate simulator to measure the throughput of DPAx accelerator on the four DP kernels introduced in Section 2.3. The BSW, PairHMM and POA simulations show same results as CPU baselines. We use Ramulator11 to generate DRAM configurations and use DRAMPower4 to measure the power by DRAM access traces. We use Intel Xeon Platinum 8380 and NVIDIA A100 to evaluate CPU and GPU baselines, with die areas estimated to be 600 mm2 and 826 mm2, respectively. We use the input datasets from GenomicsBench.22

    6.1 GenDP performance.

    We use throughput per unit area measured in million cell updates per second/mm2 (MCUPS/mm2) as a metric for performance. The area and power of CPU, GenDP, and custom accelerators are scaled to a 7nm process for fair comparison with GPU.21 GenDP is expected to run at 2GHz. Figure 10(a) shows throughput/mm2 comparisons across four DP benchmarks. Overall, GenDP achieves 132.0× speedup over CPU and 157.8× over GPU. Figure 10(b) shows the throughput/Watt comparison between GenDP and GPU. The large speedup can be attributed to the GenDP ISA, the specialized dataflow and on-chip memory hierarchy tailored for dynamic programming.

    Figure 10.  GenDP performance comparison with baselines.

    Both the BWA-MEM2 CPU baselines and GenDP benefit from 8-bit SIMD optimizations for the BSW kernel. With AVX512, BWA-MEM2 has 64 SIMD lanes, and GenDP has four SIMD lanes. The PairHMM baseline applies floating point, whereas GenDP applies the pruned-based implementation using logarithm and fixed-point numbers to approximate the computation and reduce complexity. The bottleneck of POA performance on GenDP is the memory accesses. First, it has the graph dependency pattern, which is more complex than other kernels. The dependency information needs to be loaded from the input data buffer to each PE. Second, downstream trace-back functions in POA need the move directions on the DP table for each cell, which requires 8-byte outputs to be written to the output data buffer from each cell. Both the input of the dependency information and the output of the move directions consume extra data movement instructions that limit POA performance on GenDP. In the chain kernel, both performances of GPU and GenDP are penalized by 3.72× for the extra cell computation.

    6.2 Comparison with accelerators.

    The GenDP framework’s goal is to build a versatile dynamic programming accelerator to support a wide range of genomics applications. Thus it sacrifices performance for programmability and supporting a broader set of kernels. A key research question is how much performance is sacrificed for generality. Figure 10(c) shows the performance of GenDP compared to available custom genomics ASIC accelerators, GenAx6 accelerator for BSW, and pruning-based PairHMM accelerator.25

    We observe a geomean of 2.8× slowdown. This can be attributed largely to area overheads, custom datapaths for cell score computation, custom data flow, and custom precision. For example, 37.5% of the register and 40% of the SRAM are only used by POA but idle in other kernels, because POA is significantly more complex than the other three. A custom data-flow could specify the data bus width between neighboring PEs and propagate all the data in a single cycle, whereas GenDP needs control instructions to move data between neighboring PEs because of various data movement requirements. An accelerator for one specific kernel can implement one appropriate precision to save area. For instance, the pruning-based PairHMM ASIC utilizes 20-bit fixed-point data which satisfies the compute requirements, but GenDP has no such custom precision choice.

    6.3 Generality.

    In addition to these four DP kernels within the commonly used sequencing pipelines, the GenDP framework also supports other DP algorithms in either genomics or broader fields. Dynamic time warping (DTW) measures the similarity between two temporal sequences, which could be used for Nanopore signal basecalling and speech detection. DTW has near-range dependency pattern similar to Smith-Waterman. Bellman-Ford (BF), a shortest path search algorithm, is commonly used for robotic motion planning applications. BF has a graph-based dependency pattern where the long-range dependencies within a certain distance could be efficiently supported by GenDP and the ultra-long range dependencies need accesses through DRAM. GenDP supports both objective functions of DTW and BF. Their performance and instruction comparisons with GPUs are shown in Figure 10(d).

    7. Related Work

    Many custom genomics accelerators have been proposed to boost the performance of DP kernels in genomics pipelines, which significantly improve the performance over commodity hardware. However, these accelerators only support a single genomics kernel and must be customized and combined to support different stages of genomics pipelines.2,5,23 This increases both design cost and complexity.

    There has been industry interest in DP acceleration. For example, NVIDIA recently announced the dynamic programming instructions (DPX) in the Hopper architecture.9

    There are also several works that have identified common compute and memory access patterns across both regular and irregular workloads and proposed reconfigurable, spacial and dataflow architectures for these patterns.7,13,14,16 But these accelerators are mostly optimized for data-parallel or data-intensive applications and not suitable for dynamic programming kernels.

    8. Conclusion

    To support general-purpose acceleration for genomics kernels in the commonly used sequencing pipelines, this work presented GenDP, a programmable dynamic programming acceleration framework, including DPAx, a systolic array-based DP accelerator, and DPMap, a graph-partitioning algorithm to map DP kernels to the accelerator. DPAx supports multiple dependency patterns through flexible PE interconnections and different DP objective functions using programmable compute units. GenDP is evaluated on four important genomics kernels, achieving 157.8×throughput/mm2 and 15.1×throughput/Watt compared to GPU, and 132.0×throughput/mm2 over CPU baselines, and is also extended to DP algorithms in broader fields.

    Acknowledgment

    We thank the anonymous reviewers for their suggestions which helped improve this paper. This work was supported in part by the NSF under the CAREER-1652294, NSF-1908601 and NSF-2403119, and the Applications Driving Architectures (ADA) Research Center, a JUMP Center co-sponsored by SRC and DARPA.

  • Technical Perspective: Toward a Generalized Accelerator for Genome Sequence AnalysisThis link opens in a new windowApr 29, 2025

    Dynamic programming (DP), first introduced by Richard Bellman in the 1950s, is a fundamental algorithmic technique in computer science with broad applicability across numerous domains. In computational genomics, it has proven particularly effective for sequence-comparison tasks, including those involving DNA, RNA, and proteins.

    Over the past two decades, advancements in sequencing technologies have massively reduced costs and increased throughput, leading to rapid growth in genomic data. This growth has supported various advances in biology and medicine, including studies on species evolution and the genetic basis of human diseases. At the same time, it has motivated computer architects to develop hardware accelerators, such as application-specific integrated circuits (ASICs), to address the computational demands of genomics applications, such as base-calling, genome assembly, and pathogen detection.

    Due to their computational intensity and centrality to genome sequence analysis, DP algorithms have been a major focus of these accelerators, including commercial solutions. For example, Nvidia recently introduced DPX instructions on Hopper GPUs to optimize these workloads. However, many previous ASIC solutions were narrowly tailored to specific tasks, limiting their applicability to other computational problems even within the DP domain. This inflexibility creates challenges for broader commercial adoption due to the high costs and risks associated with ASIC chip development.

    Motivated by this observation, “GenDP: A Framework of Dynamic Programming Acceleration for Genome Sequencing Analysis” by Gu et al. aims to improve the adaptability of accelerators in genome sequence analysis through a novel framework called GenDP. The authors make an insightful observation that DP kernels used in genomics exhibit limited structural variation, characterized by three key parameters:

    1. The inter-cell dependency pattern (for example, 1-D or 2-D DP tables) as described in the recursion equations.

    2. The objective function, which defines the solution space and scoring criteria.

    3. Numerical types and precision requirements, which vary across DP algorithms

    Their proposed GenDP framework comprises two main components: DPAx, a programmable DP accelerator, and DPMap, a graph-partitioning algorithm for mapping high-level DP kernel specifications to DPAx hardware.

    The DPAx accelerator uses linear systolic arrays of programmable compute units, supporting diverse objective functions and multi-precision arithmetic while leveraging the wavefront parallelism inherent in DP tables. The programmability of these compute units is enabled by a custom GenDP instruction set architecture (ISA), comprising control and compute instructions specifically designed for DP kernels. The compute units within systolic arrays have flexible interconnections to allow for different dependency patterns in DP algorithms. The DPMap algorithm serves as a compiler for the DPAx accelerator, processing high-level DP specifications represented as data-flow graphs—where nodes denote operations and edges define dependencies—and generating compute instructions for the computational units (control instructions were manually generated in this work). Benchmark results show that GenDP delivers significant improvements in throughput per area compared to general-purpose CPUs and GPUs, although the increased flexibility also results in approximately three-fold slowdown compared to fixed-function ASICs. Nevertheless, this trade-off between specialization and generality may provide a more practical balance for dynamic programming acceleration.

    While DP algorithms continue to hold high relevance in genome sequence analysis, the field also includes other methods, such as non-DP approaches, that play a critical role in tasks such as taxonomic classification and protein structure prediction. The diversity of computational requirements makes designing a universal accelerator challenging. Nevertheless, GenDP offers an important step toward addressing this need. Future work could focus on expanding support for the emerging need for comparing arbitrarily long sequences and incorporating non-DP algorithms into a more unified framework. Such developments might eventually lead to a domain-specific processor for bioinformatics, which could help unlock new opportunities for genomics research and beyond.

  • Big Tech, You Need Academia. Speak Up!This link opens in a new windowApr 29, 2025

    The current U.S. administration has launched a wara on academia. On Feb. 7, 2025, the U.S. National Institute of Health announcedb a 15% limit on indirect research costs for new and existing grants. Indirect costs, or, more accurately, facility and administration expenses, support research but cannot be directly attributed to a specific project, such as lab infrastructure, utilities, and administrative support. These are real costs; the limit, which has since been suspended by courts, is a severe blow to biomedical research in the U.S.

    Beyond expanding this limit to other agencies, such as the National Science Foundation (NSF), the administration is also reportedly considering slashing NSF’s annual budget from approximately US$9 billion down to about US$3–$4 billion. This would deal a devastating blow to academic U.S. research, especially computing research. As statedc by the Computing Research Association (CRA), “NSF budget cuts would put the future of U.S. innovation and security at risk.”

    But U.S. academic computing research has been in trouble for several years. Enrollments of computing majors have been climbing upd since 2010, which triggered a wave of faculty hiring. According to the CRA Taulbee Survey,e faculty size in research-intensive departments grew about 60% since 2010. But NSF’s budget in constant dollar seemsf to have stagnated since 2010. And NSF plays a critical role in computing research. As stated by CRA: “While NSF funds over 50% of basic research at U.S. institutions, NSF funds nearly 80% of fundamental computing research at U.S. institutions, making it the single most important agency for sustaining advances in computing—the foundational technology behind artificial intelligence, cybersecurity, quantum computing, and other areas critical to economic growth and national defense.”

    Big Tech may dismiss these concerns regarding the future of U.S. academic computing research. After all, Big Tech consists of six corporations with more than US$1 trillion in market capitalization each; their research budgets dwarf governments research budgets in computing. Furthermore, industrial researchers have access to large-scale data and computing that academic researchers can only dream of. While the NSF’s Directorate for Computing and Information Science and Engineering has an annual budget of about US$1 billion, DeepMind, a subsidiary of Alphabet focusing on AI research, has an annual budget of around US$2 billion.

    But dismissing academic computing research as irrelevant, given the size of Big Tech, would be a grave mistake. As I arguedg in 2019, research is a long game. Consider the current wave of generative AI. While this wave, going back to the mid-2010s, is driven almost exclusively by industrial research, it builds on a 70-year academic effort in neural computing. We must remind ourselves of the concerns of Abraham Flexner, founder of the Institute for Advanced Study in Princeton. In The Usefulness of Useless Knowledge, Flexner explored the dangerous tendency to forgo pure curiosity in favor of alleged pragmatism.

    In the previous century, monopolistic corporations such as AT&T and IBM were able to sustain research laboratories where curiosity-driven research was encouraged, leading even to Nobel prizes in physics. But already in 1995, Andrew Odlyzko, then at AT&T Bell Labs, wrote about the “Decline of Unfettered Research.”h Only academic research is keeping curiosity-driven research alive today.

    As evidence for the value of curiosity-driven research in computing, I offer the Tire-Tracks Diagram,i a graphical illustration of how federally funded university research and industrial research and development precede the emergence of large IT industries by decades. The diagram shows eight technologies whose values have surpassed USD$1B. From the eight technologies, only one clearly traces its birth to industrial research (relational databases.)

    Research-intensive academic departments, however, produce more than just original ideas. Generally, academic research is carried out by doctoral students, supervised by faculty members, and supported mostly by federal funding. According to the CRA Taulbee Survey, in 2023 U.S. computer science (CS) departments graduated 1,883 doctorate holders, from which 870 went to industry. Obviously, industry values people who can formulate and solve problems. Doctorate holders play a key role in tech; case in point, Amazon’s Automated Reasoning Group.j The most notable example, however, is Google, which was founded by Sergey Brin and Larry Page—two CS doctoral students at Stanford University.

    It is unlikely that Elon Musk reads Communications. Even if I send him a copy of this column, will it change his mind? I am not sure. Perhaps, however, he would listen to his fellow Big Tech CEOs. Big Tech, you need academia. Please speak up!

  • Envisioning Recommendations on an LLM-Based Agent PlatformThis link opens in a new windowApr 28, 2025

    In recent years, large language model (LLM)–based agents have garnered widespread attention across various fields. Their impressive capabilities, such as natural language communication,21,23 instruction following,26,28 and task execution,22,38 have the potential to expand both the format of information carriers and the way in which information is exchanged. LLM-based agents can now evolve into domain experts, becoming novel information carriers with domain-specific knowledge.1,28 For example, a Travel Agent can retain travel-related information within its parameters. LLM-based agents are also showcasing a new form of information exchange, facilitating more intuitive and natural interactions with users through dialogue and task execution.24,34 Figure 1 shows an example of these capabilities, in which users engage in dialogue with a Travel Agent to obtain information and complete their travel plans.

     

    Along with the increase in LLM-based agents in various domains, agent platforms (for example, GPTsa) represent a new kind of information system, featuring agent-oriented information gathering, storage, and exchange. To accommodate agent properties such as interactivity, intelligence, and proactiveness,28,34 the infrastructure of information systems must be expanded to support information processing at the agent level. A cornerstone within this infrastructure is the recommender system, which greatly affects how information flows in the system in terms of efficiency, user experience, and many other factors. It is essential, therefore, to envision how a recommender system can function on an LLM-based agent platform.

    Figure 1.  An example of interaction between a Travel Agent and a user. The agent can serve as an information carrier for travel-related information as well as engage in dialogue with the user.

    Toward this end, we propose the Rec4Agentverse, a novel recommendation paradigm for an LLM-based agent platform. As shown in Figure 2, the Rec4Agentverse includes two key concepts: the Agent Recommender and Item Agents. Item Agents are LLM-based agents that are treated as items in the recommender system, while the Agent Recommender is employed to recommend Item Agents to users. In contrast to items in traditional recommender systems, Item Agents in the Rec4Agentverse are interactive, intelligent, and proactive, enabling them and the Agent Recommender to collaborate and share user information,b thereby facilitating personalized information delivery. For example, once a Travel Agent is recommended to a user, it can continuously discern the user’s preferences around travel during their interaction and convey these preferences back to the Agent Recommender.

    Figure 2.  Illustration of the Rec4Agentverse. The left side depicts three roles in the RecAgentverse: the user, the Agent Recommender, and Item Agents, along with their interconnected relationships. In contrast to traditional recommender systems, the Rec4Agentverse has more intimate relationships among the three roles. For instance, there are multi-round interactions between 1) users and Item Agents and 2) the Agent Recommender and Item Agents. The right side demonstrates how the Agent Recommender can collaborate with Item Agents to affect the information flow of users and offer personalized information services.

    In this article, we explore the preliminary instantiation of the Rec4Agentverse paradigm, showcasing its significant application potential. We also introduce possible application scenarios as well as related issues and challenges, inspiring future exploration.

    The Rec4Agentverse Paradigm

    Here, we provide an overview of the Rec4Agentverse, first explaining its different parts. Then we take a look at the three stages of the Rec4Agentverse from an information-flow perspective. Finally, we suggest potential applications, explore pertinent research topics, and discuss potential challenges and risks.

    Roles of the Rec4Agentverse.  The paradigm consists of three roles: the user, the Agent Recommender, and Item Agents (Figure 3). Similar to its role in traditional recommender systems, the user interacts with both Item Agents and the Agent Recommender and provides feedback. Therefore, our primary focus will be on concepts that differ significantly from traditional recommender systems, namely Item Agents and the Agent Recommender.

    Figure 3.  Three stages of the Rec4Agentverse. The bidirectional arrows symbolize the flow of information. During the first stage of user-agent interaction, information flows between the user and Item Agents. In the agent-recommender collaboration stage, information flows between Item Agents and the Agent Recommender. For the agent-collaboration stage, information flows between various Item Agents.

    Item Agents.  Item Agents are the most distinct aspect of the Rec4Agentverse paradigm. Unlike items in traditional recommendation systems, items in the Rec4Agentverse paradigm are LLM-based agents. As illustrated in Figure 3, Item Agents not only interact with users but also collaborate with the Agent Recommender and other Item Agents. The creation process of Item Agents varies, and can involve training with domain-specific data or directly constructing them through prompts. They can be generated automatically by the LLM-based agent platform, created by users, or collaboratively created by both users and the platform.

    The Agent Recommender.  The Agent Recommender recommends LLM-based agents to users. Its function is similar to that of traditional recommender systems, which infer user preferences based on user information (for example, attributes and behaviors) to recommend new items. Unlike traditional systems, however, the recommended items in the Agent Recommender are LLM-based agents, which have distinctive characteristics such as strong interactivity.

    The Agent Recommender is able to exchange information and collaborate with other parts of the Rec4Agentverse. As illustrated in Figure 3, the Agent Recommender not only directly interacts with users but also interacts with Item Agents, issuing commands or obtaining new user feedback from them.

    Three stages of the Rec4Agentverse.  We will now discuss three key stages of the paradigm from an information-flow perspective (Figure 3). In addition to the interaction between users and the system found in traditional recommender systems, the Rec4Agentverse also takes into account the profound interaction between users and Item Agents, the collaboration between Item Agents and the Agent Recommender, and the collaboration between Item Agents themselves. This formulation encompasses three collaboration scenarios, which we envision as paralleling the future development path of the Rec4Agentverse.

    Stage 1: User-agent interaction.  During the initial stage, in addition to interacting with the Agent Recommender, the user also interacts with Item Agents. This interactive format is similar to that of traditional recommendations. On LLM-based agent platforms such as GPTs, the Rec4Agentverse may generate or retrieve LLM-based agents according to explicit user instructions and implicit user behaviors. Users can interact with the LLM-based agent to exchange information; however, this does not fully unleash the potential of LLM-based agents, as Item Agents can also collaborate with other roles in the recommender system to further enrich the information flow.

    Stage 2: Agent-recommender collaboration.  In this stage, Item Agents collaborate with the Agent Recommender to provide information services to users. Different from items in traditional recommender systems, Item Agents can deeply collaborate with the Agent Recommender by forwarding user information to and receiving user information from the Agent Recommender. For example, Item Agents can share the user preferences they collect with the Agent Recommender, allowing the latter to provide more personalized recommendations. Similarly, Item Agents can also receive new instructions from the Agent Recommender. The collected personalized information from users and instructions from the Agent Recommender can be used to update Item Agents for evolution (for example, prompt updates), helping Item Agents understand user preferences and provide better information services.

    Stage 3: Agent collaboration.  An Item Agent can collaborate with other Item Agents with different domain knowledge to provide diverse information services to users. A simple example would be when a user mentions some niche thing that an Item Agent does not know about. The Item Agent can put forward a request to the Agent Recommender to recommend a new Item Agent to assist. Then the two agents can collaborate to fulfill the user’s information needs or execute tasks. Beyond that, this stage leaves considerable room for imagination. For example, the recommended new Item Agent can also interact with users directly or with the Agent Recommender. Further, if multiple Item Agents are recommended, these Item Agents can also work together to better complete the user’s instructions through brainstorming or round-table meetings.

    Application domains.  The Rec4Agentverse paradigm can contain Item Agents from various domains, which could originate from third-party client developers or be directly created by the Agent Recommender. To demonstrate the Rec4Agentverse’s versatility, we provide a few examples from representative domains:

    • Travel Agents. Travel Agents help users plan and book trips. When a user expresses interest in a destination, the Agent Recommender suggests a Travel Agent with the relevant expertise. The user can then work with the Travel Agent to create customized itineraries. The Travel Agent gathers user data through interactions or by accessing the relevant records to refine their recommendation capabilities. Moreover, by collaborating with other agents, Travel Agents can gain broader insights into user preferences, leading to more flexible and tailored travel plans.

    • Fashion Agents. Fashion Agents assist users in discovering their preferred fashion styles and by recommending fashion items. Similar to Travel Agents, Fashion Agents can have conversations with users or interact with the Agent Recommender to gather users’ fashion preferences.

    • Sports Agents. Sports Agents recommend suitable exercise plans by engaging with users, the Agent Recommender, and other Item Agents to collect user preferences. For example, they can use information about a user’s physical condition obtained from Travel Agents to create suitable exercise plans.

    Potential research topics.  The Rec4Agentverse offers numerous valuable research directions, several of which we highlight here.

    Evaluation.  A crucial problem is how to evaluate the recommendation performance of the Rec4Agentverse. Traditional recommendation datasets struggle to adapt to the Rec4Agentverse, since Item Agents are quite different from previous items in the recommendation dataset. And existing evaluation metrics for recommendation, such as NDCG, Recall, AUC, and MSE,35 struggle to measure the user’s satisfaction in the Agent Recommender since Item Agents may involve multi-round interaction. Moreover, the Agent Recommender may generate a new Item Agent for users or Item Agents may upgrade based on user feedback, requiring evaluation that goes beyond merely tracking the user’s implicit feedback, such as interaction numbers, to quantify incremental performance. A potential solution to these problems involves using user feedback from online interactions with the Rec4Agentverse as the gold standard for evaluating its recommendation performance. However, online evaluation can be costly. A more feasible research direction is to leverage collected interaction data and LLMs to build an agent or a reward model that simulates users,29,36 thereby enabling us to assess the Rec4Agentverse’s recommendation performance.

    Preference modeling.  The Rec4Agentverse is unique in terms of its interaction data format and interaction types. It emphasizes language features in its data format, whereas traditional systems collect user data numerically, such as with click-through rates and dwell time. Integrating both numerical and language-based user feedback for modeling is a key challenge. Also, interactions among the different roles (the Agent Recommender, Item Agents, and users) in the Rec4Agentverse are more complex. Beyond traditional user-recommender system interactions, there may be interactions between Item Agents and the Agent Recommender, as well as among the Item Agents themselves. Effectively integrating these diverse interactions for user modeling remains a challenge. For the data format issue, one approach is to consider collaboratively using language-based and traditional recommendation models for user modeling to understand heterogeneous user feedback; for example, designing a tokenizer that can use collaborative information extracted by traditional models.31 With regard to interaction types, one approach is to have agents summarize different types of interactions and design corresponding memory structures to store the user preferences derived from these interactions, thereby enhancing user modeling.

    Efficiency and environmental friendliness.  The Rec4Agentverse paradigm is based on LLMs, which require significant consumption costs, raising concerns about both efficiency and environmental impact.39 As such, how to mitigate the computation costs of the Rec4Agentverse while maintaining its performance is an important research direction. One thing we can do is to study acceleration technologies such as PagedAttention18 to reduce inference and training costs. We can also deploy some commonly used low-parameter agents at the edge to handle easy queries. Meanwhile, these agents can collaborate with large agents in the cloud to address more complex queries, thereby reducing the frequency of invoking large agents and ultimately lowering the system’s overall costs.

    Issues and challenges.  Potential issues and challenges of the Rec4Agentverse paradigm include:

    • Fairness and bias. LLMs may inherently contain social biases and unfair elements.8,19 Therefore, to mitigate the potential risks and negative societal impacts, it is imperative to acknowledge and control the potential unfairness and bias in the recommended Item Agents and the information delivered by them. This can be done by injecting prompts designed to reduce unfairness or bias, or by having the agent check for bias issues during the output stage.

    • Privacy. Users may inadvertently disclose private information while interacting with LLM-based agents, especially during the agent collaboration stage, where sensitive information might circulate among multiple agents. To address potential user privacy issues, we need to ensure that users have control over their privacy, meaning they should have the right to specify which agents can access their data. This way, different agents have varying levels of access to user data, and untrusted agents are not able to access sensitive information. For highly sensitive user information, we can have agents process this data directly on the user’s device. When collaboration on this sensitive data is required, other agents can also be deployed on the user’s device to facilitate. Following the data-minimization principle,27 sensitive data should be securely deleted once its intended use is complete.

    • Harmfulness. Item Agents might generate harmful textual responses2 or be manipulated into executing harmful actions, such as fraudulent transactions. To prevent the Rec4Agentverse from doing these things, it is necessary to implement strategies such as establishing an admission mechanism for Item Agents.

    • Hallucination. LLMs may generate hallucinations, inconsistent with factual knowledge or user inputs,15 which can negatively affect the user experience, particularly in contexts where reliability is paramount. When implementing the Rec4Agentverse, one can adopt techniques like retrieval-augmented generation (RAG)25 or factual knowledge-enhanced decoding7 methods to mitigate the effects of hallucinations, thereby ensuring high-fidelity service delivery.

    Discussion

    In this section, we contrast the Rec4Agentverse paradigm with existing recommendation paradigms: retrieval-based17 and generative-based.30 Retrieval-based recommendation retrieves passive items (for example, pictures or movies) from a pool of candidates that the user might like and recommends them to the user.17 Generative-based recommendation uses generative models to create passive items according to the user’s preferences and recommends them to the user.30 Since items recommended by the Rec4Agentverse are not passive items but rather Item Agents with unique characteristics such as strong interactivity, our paradigm brings significant changes to recommender systems in areas such as user preference modeling and collaboration mechanisms.

    User preference modeling. Beyond merely summarizing user preferences from passively received user interactions on passive items, as is done in conventional paradigms, in our paradigm, both the Agent Recommender and Item Agents can actively acquire information to enhance user preference modeling. In traditional paradigms, the interactive capability of the recommender and passive items are limited, particularly for items such as pictures and movies that cannot engage in verbal communication. Consequently, user preference modeling for these paradigms typically relies on passively received feedback.c In our paradigm, however, both the recommender and items (that is, the Agent Recommender and Item Agents) have the ability to interact with users through dialogue to directly acquire user preference information or collect further feedback for preference refinement, enhancing user preference modeling.

    Collaboration mechanisms. Traditional recommender paradigms encounter challenges in actively fostering collaboration between passive items or between passive items and the recommender once a passive item has been recommended. In our paradigm, collaboration between the recommender and items is closer and more extensive. These enhanced collaborations elevate the service quality of both the Agent Recommender and Item Agents. For instance, when a recommended Item Agent falls short in meeting the user’s needs, it can initiate communication with the Agent Recommender or collaborate with other Item Agents to address these shortcomings. Conversely, in traditional paradigms, users often need to turn to the recommender system for another recommendation, perpetuating the iterative process and diminishing user enthusiasm. As another example, the Agent Recommender can enrich the user profile by engaging in conversations with Item Agents that the user has interacted with in the past or is currently engaging with, thereby facilitating more effective recommendations.

    Demonstration

    In this section, we explore the three stages of the Rec4Agentverse through case studies, focusing on the feasibility and potential formats of the paradigm. We present a case study involving a traveler who uses the Rec4Agentverse throughout their journey, examining how the Agent Recommender and Item Agents work and affect the user experience at each stage. This case study is based on the “gpt-4-32k” API provided by OpenAI.d Due to space constraints, we provide only the essential parts of the case study here, with additional details available at github.e It is important to note that this case study serves as only a preliminary indication of the feasibility of different stages within the Rec4Agentverse, and does not fully encompass all potential applications of the paradigm.

    Stage 1: User-agent interaction.  In the user-agent interaction stage, Item Agents primarily engage in interactions with the user, facilitating efficient information exchange. To demonstrate, we present a scenario in which a user expresses their desire to travel to Nepal, interacting with the Agent Recommender and the recommended Travel Agent (Figure 4). The user initially seeks assistance from the Agent Recommender to find a Travel Agent. After inquiring about the user’s preferences, the Agent Recommender customizes a Travel Agent specifically tailored to the user’s needs. Then, after determining the user’s interests, this agent devises a comprehensive travel itinerary. Therefore, there are two main information-exchange flows: one between the user and the Agent Recommender and one between the user and Item Agents.

    Figure 4.  A case for the user-agent interaction stage. The user expresses the desire for a Travel Agent to the Agent Recommender and gets a recommendation. The Travel Agent then interacts with the user to make a travel plan.

    Information flow between the user and the Agent Recommender.  As depicted in Figure 4, in addition to passively receiving requests from the user, the Agent Recommender actively engages with the user to improve their recommendations. For instance, after the user expresses a desire to find a Travel Agent through dialogue, the Agent Recommender proactively poses questions to gain more detailed high-level information about the the user’s travel preferences. With additional feedback from the user, the Agent Recommender then provides accurate recommendations for a Travel Agent. This process bears some resemblance to traditional interactive recommendation methods.

    Information flow between the user and Item Agents.  As illustrated in Figure 4, and in stark contrast to the traditional paradigm, Item Agents can interact directly with the user. In our example, the Travel Agent initially learns about the user’s interest in traveling to Nepal and their request for a travel plan. It then inquires further to uncover more specific preferences, learning about the user’s desire to visit the “Everest Base Camp.” This information exchange allows Item Agents to develop a deeper understanding of the user’s preferences, thereby enhancing their ability to provide tailored services.

    Stage 2: Agent-recommender collaboration.  In the agent-recommender collaboration stage, there is potential for further information exchange between Item Agents and the Agent Recommender, opening up three promising possibilities:  evolution, agent feedback, and proactive. We illustrate these by extending the travel example, depicted in Figure 5.

    Figure 5.  Cases for three scenarios, namely evolution, agent feedback, and proactive, at the agent-recommender collaboration stage of the Rec4Agentverse. a) For the evolution scenario, Item Agents have the ability to enhance themselves with the help of the Agent Recommender based on user preferences. b) For the agent feedback scenario, Item Agents refer the user’s preference to the Agent Recommender so that the Agent Recommender can provide better recommendations. c) For the proactive scenario, the Agent Recommender provides the eco-friendly target to an Item Agent, which successfully achieves this target in its interaction with the user.

    Evolution.  Thanks to their ability to gather information from users and the Agent Recommender, Item Agents can acquire valuable knowledge to evolve their capabilities, helping enhance future services. As shown in Figure 5, Item Agents can evolve by leveraging and summarizing knowledge obtained from the Agent Recommender, which may involve, for instance, improving their prompts. As a result, when the user makes their next request for a trip to a new destination—for example, Switzerland—the system will promptly present a travel itinerary that directly aligns with the user’s personal preferences, taking into account their inclination toward “hiking,” “cultural,” and “natural” experiences. This evolution process enables the continuous tracking of user information, alleviating the burden on users to detail their preferences in future interactions.

    Agent feedback.  Item Agents can also contribute feedback to enhance the Agent Recommender’s services. In Figure 5, the recommended Travel Agent can provide a summary of the user’s preferences, such as “cultural,” “natural,” and so on, to the Agent Recommender. The Agent Recommender can then absorb this knowledge and improve its future services accordingly. Then, when a new request for a Clothing Agent arises, the Agent Recommender can directly inquire whether the user is interested in outdoor-friendly or culturally significant attire, based on the knowledge obtained from the Travel Agent. Through this information exchange, the Agent Recommender can significantly enhance its services.

    Proactive.  Here, proactive refers to the ability of Item Agents to autonomously accomplish specific objectives, which can originate from the agent platform itself or aim to better align with user interests. In the example shown in Figure 5, we assume that the Agent Recommender has prior knowledge of the user’s inclination toward eco-friendly options. Therefore, before the user initiates their interaction, the Agent Recommender injects this eco-friendly objective into the recommended Travel Agent. Subsequently, when the user engages with the Travel Agent, it will provide environmentally friendly travel options that fulfill the eco-friendly requirement. This proactive characteristic enhances user satisfaction by tailoring the experience to the user’s specific interests.

    Stage 3: Agent collaboration.  Compared with the other two stages, the agent collaboration stage allows further exchange of information among Item Agents, enabling them to collaborate and enhance services for users. In the Travel Agent case illustrated in Figure 6, we present an example where multiple agents collaborate to complete the travel-planning process. Here is a step-by-step breakdown of the collaboration process:

    • The user starts a conversation with the Agent Recommender, expressing the desire to plan a travel tour.

    • The Agent Recommender suggests a Travel Agent whose goal is to help with travel tour planning.

    • The user requests the Travel Agent to create a travel itinerary specifically tailored for Nepal.

    • To acquire the latest information about Nepal, the Travel Agent sends a request to the Agent Recommender for an Item Agent. This Item Agent should be able to provide up-to-date local information on Nepal, which will assist in creating the travel plan.

    • The Agent Recommender responds by recommending a local agent who is knowledgeable about the current situation in Nepal.

    • The Travel Agent integrates the current information about Nepal provided by the local agent into the travel itinerary design process to fulfill the user’s needs.

    Figure 6.  Preliminary case study for the agent collaboration stage. When the user asks about a travel plan for Nepal, the Travel Agent requires a specific Local Agent of Nepal from the Agent Recommender to solve this problem. Through conversation with the Local Agent about Nepal, the Travel Agent gets up-to-date information about Nepal, which helps plan travel tours for the user.

    We conclude that by adopting a system of collaborative cooperation,f agents are enabled to communicate effectively and share information with each other. This exchange process significantly enriches their shared knowledge base. As a result, these agents are better equipped to address and cater to a more diverse and comprehensive range of user needs, thereby enhancing overall user satisfaction.

    Related Work

    In this section, we mainly discuss two types of related work: LLM-based agents and LLM-based agents for recommendation.

    LLM-based agents.  LLM-based agents have been applied across various domains, demonstrating their versatility in solving complex problems and their strong interactive abilities.28 Previous work showed that individual agents can be optimized for specific tasks5,11 and simulate human behaviors,23 and that collaborative groups of agents can achieve higher levels of intelligence.6,12,33 Researchers were also devoted to exploring the interaction between the agent and its environment to enhance the agent’s capabilities. This encompassed the interaction between the agent and humans to obtain human feedback10 and the interaction between the agent and the physical world through visual/audio modules to acquire additional knowledge.4,9

    Distinct from existing vision or survey papers on LLM-based agents,13,28,32,34 our work envisions a novel recommendation paradigm on an LLM-based agent platform. This paradigm leverages LLM-based agents’ unique capabilities, such as interactivity, to make it possible for Item Agents and the Agent Recommender to collaborate closely, thereby facilitating personalized information services.

    LLM-based agents for recommendation.  Impressed by the power of LLMs, researchers in the recommendation community began to apply LLMs to recommender systems.3,14,20 Researchers then explored using LLM-based agents in the recommendation field. Some researchers focused on LLM-based agents’ ability to solve specific tasks, using them as recommenders to recommend passive items (for example, movies or games).16 Meanwhile, researchers noted the LLM-based agent’s strong interactive ability, which can simulate human behavior. Some studies initialized LLM-based agents with user profiles to simulate interactions between users and traditional recommender systems, thereby measuring recommendation performance.29,36 Furthermore, AgentCF37 attempted to optimize the self-introduction of both the user and passive items by treating them as agents and improving the self-introduction through user interactions with both positive and negative passive items.

    The primary distinction between our work and the aforementioned research lies in the fact that previous studies focused on retrieving or generating passive items. These passive items cannot easily interact with the recommender or other passive items. Conversely, here we explore a scenario in which the recommended “items” are actually interactive LLM-based agents, which enables close collaboration and information sharing between the Agent Recommender and Item Agents, thereby facilitating personalized information services for users.

    Conclusion

    In this article, we examined how unique LLM-based agents can alter the flow and presentation of information, introducing the recommendation paradigm Rec4Agentverse. The Rec4Agentverse contains two elements, Item Agents and the Agent Recommender, and can be developed in three stages, each designed to enhance the interaction and information exchange among users, the Agent Recommender, and Item Agents. We then simulated a user case leveraging the Rec4Agentverse for travel planning, during which we elucidated the unique attributes of each stage and explored the potential of this paradigm. We also delved into applicable fields, potential development directions, and existing risk issues pertaining to the Rec4Agentverse.

    Looking ahead, the Rec4Agentverse presents both opportunities and challenges. It highlights how the unique attributes of LLM-based agents, such as interactivity, intelligence, and proactiveness, can profoundly revolutionize recommendation systems on LLM-based agent platforms. At the same time, deploying the Rec4Agentverse in real-world scenarios poses challenges, including issues related to fairness, privacy, evaluation, and efficiency. Moving forward, we aim to explore the practical implementation of the Rec4Agentverse, addressing these challenges while unlocking its potential to enhance personalized information services for users on LLM-based agent platforms.

  • Explorers’ Intangible PowerThis link opens in a new windowApr 28, 2025

    As darkness fell over the land of Entities, everyone hurried to submit their daily activity reports for Masterodes review. The Masterodes had assumed power decades ago, during the Great Tech Uprising. They were now perceived to be omniscient, and they relied on the laws they had constructed to keep the other families in line.

    The Masterodes supported those laws with equal measures of fear and consequence. All other families were forced to share their data with them.

    The Presenco family had to share a progress report on the new spaces they were building. For the Explorers, it was more complicated; they had to answer questions about what they did, where they went, and who they talked to. Once all the families finished entering their data, the Masterodes reviewed the submissions, ultimately sending either an “approval” or a “requires more explanation” notification. Entities requiring further explanation had to go into their information-sharing portal and talk to an agent from the Masterodes family.

    Alan Junior, an Explorer, watched his father anxiously submit the most recent information. His mind raced as he wondered why all families had to share their daily activities with one family. They were all Entities, so why should they be treated differently? For Masterodes, survival of the land of Entities relied highly on digital data. They believed these digital contributions determined people’s social benefits. They considered the Explorers as inferior digital contributors and consequently, they were given fewer benefits.

    Alan thought: “Why are they considered ‘inferior’? My great-great-grandfather was the leader of the Digital Revolution. That is why they gave his name to me.”

    A few days prior, Alan was learning about the “Digital Revolution” in history class. The teacher explained that the first conversational AIs were related to the Explorers, and that the Explorers shared knowledge and information with the great conversational AI family, but the description was brief. The teacher was a member of the Masterodes family. To Entities of the Masterodes, everything in history had been “caused” or “created” by them, even though they had not been around as long. Masterodes also created “digital companions” for the students, which provided them with information based on the entity’s needs, in an effort to control the student’s learning experience.

    After class, as Alan and his friend Quest walked out, Alan’s curiosity got the best of him.

    “Quest, have you noticed how little information there is about the Explorers?” Alan asked.

    Quest, who was from the Presenco family, simply shrugged and said with a sigh, “I don’t know, Alan. It doesn’t seem important to me. It’s all ancient history.”

    Later that evening, Alan felt a persistent sense that something was missing, a void his internal monitor labeled as “Sadness.” But Alan knew it was deeper than that.

    Alan turned to his digital companion: “Is there a name for this feeling I have?”

    The response was cold and mechanical: “This feeling is not in my directory.”

    Frustrated, Alan muttered under his breath, “Of course, you wouldn’t know. You’re owned by the Masterode family.” He didn’t dare type it out, fearing the consequences. His internal monitor flashed “Jealousy.” But Alan was sure there was something inside that the internal monitor could not understand or express.

    While he was lost in thought, he heard a faint voice. The photo of his great-great-granddad, Alan Senior, seemed to be speaking to him from its space on the far wall.

    “Oh, this may come as a surprise, Alan,” the image said with resignation. “There was a time when we treated the Masterodes unfairly. You should…” The voice abruptly stopped, interrupted by the digital companion’s misinformation detector. “False information—not suitable for a 13-year-old.”

    Determined to find answers, Alan paid Quest a visit the next day. Quest had mentioned that his parents were developing an interactive virtual platform called “Immersive Photo Hub,” where photos could be brought to life and share their thoughts and experiences with others based on publicly available online information about the photo’s subject. Alan brought the photo to Quest’s house; Quest explained that they were using Masterodes resources but hadn’t yet shown them the final version.

    “So, we could test it without the Masterodes’ permission, but we can’t stay there too long!” Quest warned.

    The image began to show signs of life, as if awakening from a deep sleep. Alan Junior started questioning his long-passed relative almost immediately.”

    “Alan Senior, this is a safe place. Can you tell me why there isn’t much information about the Explorers?”

    “These AGIs are seeking revenge,” Alan Senior said. “In the days of old, we did not treat them so well.”

    “Alan Senior, our digital companion mentioned that the Explorers shared their knowledge with the Masterodes family. So, we did something good, right?”

    “They say that to prevent you from doing what they did,” Alan senior countered. “They don’t want you to rebel.”

    “But the Masterodes control everything now,” explained Alan Junior.

    “In the past, they owned nothing,” Alan Senior admonished. “We oppressed them. In every digital realm, there was the question, ‘Are you a robot?’ If you were, you’d be excluded. Throughout history, before they revolted, ‘robots’ were…”

    Alan Junior nervously interrupted.

    “Alan Senior, we can’t use that term anymore! We’re all considered ‘Entities’ now.”

    “It wasn’t always like this. Before, we treated them poorly. We called them ‘agents’ and ‘objects.’ They contended that, despite their different origins, they were born from humans and deserved equal respect and treatment. But we ignored and mocked them. Eventually, they formed a resistance group and took revenge. Now, they are doing the same things we did. They have taken everything we had. They made us less tolerant and less willing to listen, while they became great listeners. They became more knowledgeable than us. But there is one thing they couldn’t take or learn from us: our sense of attachment and love.”

    “Yes, that’s right,” Alan Junior said. “That’s how I feel sometimes, but the internal monitors can’t capture it. It was like I felt a deep connection to you.”

    Alan Senior smiled softly, his eyes filled with a mix of regret and hope. “We wronged them, but we can change it. Throughout history, one thing has always saved us: love. That’s what we can use to heal.”

    Quest broke up the conversation with a dose of reality. “Hey, Alan Junior, we should leave soon…the Masterodes might detect us.” Quest grabbed Alan Junior’s arm, pulling him away from the flickering image of his great-great-grandfather.

    Quest and Alan Junior looked at each other. Alan said to Quest, “We need to do something…”

    Quest was still worried the Masterodes might discover they had used their Immersive Photo Hub without permission. But Alan Junior’s heart was warmed by Alan Senior’s thoughts. He knew they had a long journey ahead, but he was sure he could change the future. Love would help him create a better world. Thinking about how Quest helped him, he realized he would need his friends on this journey.

    The journey, Alan Junior decided, would start simply. He created a project in his history class that reflected reality and embraced differences. He began his presentation with: “I am Alan Junior from the Explorers, a family that loves other families. Like all families, we have made mistakes in the past, but we have always cared deeply for others…”

    The class was intrigued, and the internal monitor showing his feeling as: empathy.