Daniel Lemire is a full professor of computer science. He is particularly interested in software performance and indexing techniques in data science. He likes to take a critical look at the use of technology.
Daniel Lemire is among the 2% most cited scientists in the world (Stanford University rankings, 2023). He is among the 0.0006% most followed programmers in the world on GitHub; GitHub has over 100 million developers.
Much of our recent technological progress, in all fields, is based on software. Adopted by computer giants, professor Lemire's work has led to remarkable tangible gains:
- Today's disks and networks are capable of transmitting data at gigabytes per second. Unfortunately, our software often artificially limits performance to megabytes per second. This is particularly the case when web services exchange billions of data every day. Professor Lemire developed the first software library called the simdjson parser, capable of processing data from web services (JSON) at gigabytes per second. Today, this parser is used in major systems such as Facebook Velox, and by major companies including Google, Shopify and Intel. It is also part of a fundamental tool in computer science (Node.js), where it helps to load configuration files faster. To date, the simdjson parser remains the world's fastest for processing JSON documents. This high-impact discovery earned him the Université du Québec's Prix d'excellence 2020 for research success in all fields, in a population of over 3,500 researchers. The results of his work are used to speed up parsing in the Google Chrome and Safari browsers. The article On?demand JSON: A better way to parse documents? was the most read article of the last 5 years at Software: Practice and Experience (2024).
- Professor Lemire designed Roaring Bitmaps, which have become a standard and part of countless major systems, including Google, for data analysis within YouTube, as well as for companies such as Uber, Microsoft and Wikipedia. Professor Lemire's work on Roaring bitmaps was conceived as an effective alternative to their former EWAH format, which remains widely used by millions of programmers every day, opening up the scope of this discovery all the more.
- Numbers are usually stored on disks or exchanged over a network as strings, converted into standard binary form by software during operations. This conversion represents a problem that has remained virtually untouched in the scientific literature for almost 30 years. Thanks to a new algorithm, Professor Lemire succeeded in multiplying the speed of number reading within software systems by a factor of four. The algorithm has been adopted by several programming languages (C#, Go and Rust), the standard C++ library under Linux, and several major systems including the Safari, Chrome and Microsoft Edge web browsers. By speeding up a fundamental operation for software systems, this algorithm has become ubiquitous in our everyday computing tools.
- Still on the subject of string operations, Professor Lemire's team has also produced the simdutf software library, which transforms and validates strings six to ten times faster than conventional methods. This library includes several new algorithms, and is part of the popular Node.js JavaScript runtime. In turn, Node.js is used by a wide range of systems, including to create web applications at Netflix, Uber, LinkedIn, Walmart, and is the core engine of major office systems, such as Slack, Discord and Microsoft Teams. His Unicode string validation algorithm is used by several major systems, including the PHP interpreter.
Professor Lemire has also made a wide range of contributions to fundamental computer systems. For example, he produced a new algorithm used by the Linux kernel and by the standard libraries of several programming languages (GNU libstdc++, Microsoft C++ standard library, etc.), in turn used for software operations in a very wide variety of systems. It has broken records for speed and efficiency when decoding and encoding binary data into text (base64), a process involved in the e-mail standard in particular, in some cases reaching speeds 20 times faster than a conventional approach. These algorithms have become popular technologies for web development.
The international impact of his work is also demonstrated by the fact that his algorithms and tools are cited in 37 patents held by companies including Microsoft, LinkedIn, Oracle and Fujitsu Limited.
Professor Lemire also stands out for his work in knowledge mobilization and his communication skills. Since 2004, he has maintained a computer blog, publishing several posts a month. The blog is followed by over 12,500 subscribers from all over the world. His posts are often commented on, enabling him to engage in conversation with the international community. They are cited by other reference blogs and recognized by Silicon Valley, increasing his influence.
Professor Lemire regularly participates, as a member of program committees, in the organization of leading international computer science conferences (e.g. ACM CIKM, WWW, ACM RecSys, etc.). As a speaker, he was named the most popular speaker at QCon2019, which has been viewed over 69,000 times on YouTube to date.
He is also editor of Software: Practice and Experience magazine since 2020. Founded in 1971, this journal is one of the most prestigious in computer science, ranking in the top 20% of most-read and cited journals in the field, with a citation rate that has risen sharply over the past five years. In addition to serving on numerous selection committees for major research funding agencies in Quebec and Canada, he was co-chair of NSERC's Computer Science Discovery Grants Committee in 2020-2021.
Professor Lemire has maintained an average of more than four publications per year over the past ten years, nearly half of which are the result of international collaboration. His publications, cited more than 5,000 times by peers, extend beyond the field of computer science, with applications in the social sciences, psychology and earth sciences. His research work is also cited in several social media. A fervent advocate of free software and the democratization of knowledge, Professor Lemire has widely published the results of his work in open access (open source), including his computing tools, making them freely available to the international computing community.
In short, Professor Lemire has contributed to the training of new researchers with diverse profiles who are still working in research. He also has a remarkable influence as a science popularizer, through his open science approach and his many knowledge transfer initiatives.
Talks (YouTube)
NodeConf EU 2023
BID 2023
SPIRE 2021
Go Systems (San Francisco, 2020)
Performance Summmit III (Seattle, 2020)
QCon San Francisco 2019 (best voted talk!)
Spark Summit East 2017
Laboratory
We are lucky to have a fully equipped laboratory with a dedicated technician. We have a server farm that has been used worldwide for experiments in software performance (e.g., by researchers such as Agner Fog). Some of our machines have the following specifications:Microarchitecture Icelake : Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz
- Icelake microarchitecture: Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz
- Haswell microarchitecture: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
- Knights Landing microarchitecture: Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz (64 cores)
- Skylake microarchitecture: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
- Skylake-X microarchitecture: Intel(R) Xeon(R) W-2104 CPU @ 3.20GHz
- IBM POWER9 2.2 GHz, 4-core
- Cannonlake microarchitecture: Intel Core i3-8121U CPU @2.20GHz
- Skylark microarchitecture (ARMv8): Ampere eMAG CPU 32 cores @ 3.3 GHz
We also have several workstations and space in our laboratory to explore virtual reality as a tool in data science.
Students and post-doctoral fellows
We are recruiting students and postdoctoral fellows. If you love writing crazily fast software and want to come to Montreal, drop us a line. Link to an impressive GitHub profile is an asset. Speaking French is necessary if you want to pursue an academic program with me at the Université of Québec, except maybe at the Ph.D. level where allowances can be made for strong students. Some of our best students are women. We offer scholarships for graduate studies in software performance for data engineering (in French).
If you are a Canadian undergraduate student with at least a B average, you might be interested in coming to work with Daniel Lemire under an NSERC Undergraduate Student Research Awards. The awards help pay for a full time research project in our Montreal labs. The application deadlines are:
- March 1st for the Summer term;
- July 1st for the Fall term;
- November 1st for the Winter term.
It is an ongoing competition: we receive applications for every term. It is ok if you do not speak French. Please allow at least a week to put together an application with my help.
If you are interested in pursuing a master in information technology full-time under the supervision of Daniel Lemire in Montreal and you know some French, I take applications for NSERC Graduate Scholarships. You need to have a strong academic profile to apply. You should be a Canadian citizen or permanent resident of Canada. The deadline is December first of each year. You must plan ahead. We take applications every year.
If you are interested in pursuing a Ph.D. in cognitive computing full-time under the supervision of Daniel Lemire in Montreal and you know some French, we take applications for NSERC graduate scholarships. You need to have a strong academic profile to apply. You should be a Canadian citizen or permanent resident of Canada. The deadline is November 1st of each year. You must plan ahead. We take applications every year.
Moreover, all students finishing an M.Sc. thesis in information technology with us receive a scholarship, automatically. All students making progress on a doctorate in cognitive computer science receive automatic scholarships. Enrolment in a PhD program implies a waiver of tuition fees for foreign students.
Daniel Lemire regularly supervises students, from the undergraduate to the Ph.D. level. He works primarily with students who love to program and who prefer an open source setting (e.g., Linux). Many of his students contribute to open-source projects on sites such as GitHub.
He recently supervised the following Ph.D. students:
- Pierre Marie Ntang, cognitive computer science (graduated in 2023);
- Gary Germeil, cognitive computer science (graduated in 2022);
- Tarek Khei, cognitive computer science (graduated in 2020);
- Xueping Dai, environmental science (gradauted in 2019);
- Erick Aokou Koffi, cognitive computer science (graduated in 2018);
- Badis Merdaoui, cognitive computer science (graduated in 2017);
- Jing Li, computer science (graduated in 2016);
- Samy Chambi, computer science (graduated in 2016);
- Hazel Webb, computer science (graduated in 2010).
Two of his Master's students have been awarded the Governor General's Gold Medal (Verret in 2022, Courcot in 2023). Several of his students occupy key positions: e.g., Maxime Boisvert (M.Sc., 2017) is Production Engineering Manager at Shopify, Shany Carle (M.Sc., 2017) and Carine Croteau (M.Sc., 2020) are computer science professors at cégep de Victoriaville, Shira Smith is engineer at Discord in California.
Books
-
Java pas à pas
-
Programmation avec Python: des jeux au Web
-
La science des données: Théorie et applications avec R et Python
Education
- Postdoctorate (Institute of Biomedical Engineering)
- Engineering Mathematics Ph.D. (University of Montreal and Polytechnique Montréal)
- Master in Mathematics (University of Toronto)
- Bachelor degree in Mathematics (University of Toronto), with High Distinction
Research Interests
- Data Science
- Data Indexing
- Data Engineering
- Software Performance
- Vectorization (SIMD)
Teaching
Program direction
Courses
- DSM 9401 - Examen de synthèse
- DSM 9411 - Projet de thèse
- DSM 9500 - Thèse
- INF 1220 - Introduction à la programmation
- INF 2020 - Programmation d'applications avec Python : des jeux au Web
- INF 6104 - Recherche d'informations et Web
- INF 6107 - Web social
- INF 6408 - Informatique de l'analyse multidimensionnelle
- INF 6450 - Gestion de l'information avec XML
- INF 6460 - Recherche et filtrage d'informations
Courses under preparation
- INF 1424 - Projet de développement logiciel en informatique mobile
- INF 2007 - Programmation avancée
Research
Research program
We seek to accelerate software indexing techniques, either within search engines or within databases. In this work, we exploit recent and emerging hardware capabilities. In particular, we seek to fully benefit from vector instructions. To keep the memory close to the processor, we seek to improve index compression, whether they are inverted indexes, B-trees or bitmap indexes. We seek to uncompress data at great speed in RAM. We want to accelerate common operations such as intersections and unions.
Current research grants
- Data Processing at Gigabytes Per Second (NSERC Discovery grant, 2024-2030): $145,000
- Development of innovative and adaptive solutions in the field of digital distance learning, implementing technologies inherent in artificial intelligence to enhance learners' learning experience (AUF Grant, 2024-2025): 20,000 euros
- SPAR Lab: a Research Laboratory To Develop and Assess Smart Process Applications (Innovation Fund with Hafedh Mili & Kim L Lavoie [PI], 2020): $720,000
- RQRD/AI (2020-2021) with Isabelle Savard: $45,000
- FRQNT team grant (2018-2021) with Zhen Cheng: $162,000
- Faster Compressed Indexes On Next Generation Hardware (NSERC Discovery grant, 2017-2024): $294,000
- Faster Compressed Indexes On Next Generation Hardware (Acceleration Supplement, 2017-2024): $120,000
- Adapting forests to global change through high-tech field monitoring, transplantation experiments and simulation models (John R. Evans Leaders Fund with N. Bélanger [PI] and E. Filotas): $800,000
Publications & Presentations
Koekkoek, Jeroen, & Lemire, Daniel
(In Press).
Parsing Millions of DNS Records per Second. Software: Practice and Experience.
Brackett-Rozinsky, Nevin, & Lemire, Daniel
(In Press).
Batched Ranged Random Integer Generation. Software: Practice and Experience. https://doi.org/10.1002/spe3369
Courcot, Blandine; Lemire, Daniel, & Bélanger, Nicolas
(In Press).
Dynamics of soil water potential as a function of stand types in a temperate forest: Emphasis on flash droughts. Geoderma Regional. https://doi.org/10.1016/j.geodrs.2024.e00850
Keiser, John, & Lemire, Daniel
(2024).
On-Demand JSON: A Better Way to Parse Documents?. Software: Practice and Experience, 54 (6). https://doi.org/10.1002/spe.3313
Nizipli, Yagiz, & Lemire, Daniel
(2024).
Parsing Millions of URLs per Second. Software: Practice and Experience, 54 (5). https://doi.org/10.1002/spe.3296
Lemire, Daniel
(2024).
Exact Short Products From Truncated Multipliers. Computer Journal, 67 (4). https://doi.org/10.1093/comjnl/bxad077
Clausecker, Robert, & Lemire, Daniel
(2023).
Transcoding Unicode Characters with AVX-512 Instructions. Software: Practice and Experience, 53 (12). https://doi.org/10.1002/spe.3261
Mushtak, Noble, & Lemire, Daniel
(2023).
Fast Number Parsing Without Fallback. Software: Practice and Experience, 53 (7), 1467-1471. https://doi.org/10.1002/spe.3198
Graf, Thomas Mueller, & Lemire, Daniel
(2022).
Binary Fuse Filters: Fast and Smaller Than Xor Filters. Journal of Experimental Algorithmics, 27. https://doi.org/10.1145/3510449
Humeau, Tom; Savard, Isabelle; Lemire, Daniel; Dionne, Pierre-Olivier; Angulo Mendoza, Gustavo Adolfo; Plante, Patrick; Pinard, Anne Marie, & Roy, Jean-Sébastien
(2022).
FORCES 3 : Exploitation à des fins pédagogiques des données d’un portail d’apprentissage de l’autogestion de la douleur. Développement d’une architecture de collecte et d’analyse de données et d’un module de suivi du développement des compétences. Médiations et médiatisations (12), 74-97. https://doi.org/10.52358/mm.vi12.287
Lemire, Daniel, & Muła, Wojciech
(2022).
Transcoding Billions of Unicode Characters per Second with SIMD Instructions. Software: Practice and Experience, 52 (2).
Humeau, Tom; Savard, Isabelle; Dionne, Pierre-Olivier; Angulo-Mendoza, Gustavo; Plante, Patrick; Pinard, Anne Marie, & Lemire, Daniel
(2022).
FORCES 3 : Exploitation à des fins pédagogiques des données d’un portail d’apprentissage de l’autogestion de la douleur. Développement d’une architecture de collecte et d’analyse de données et d’un module de suivi du développement des compétences. Médiations & médiatisations (12), 74-97. https://doi.org/10.52358/mm.vi12.287
Klarqvist, Marcus D. R.; Muła, Wojciech, & Lemire, Daniel
(2021).
Efficient Computation of Positional Population Counts Using SIMD Instructions. Concurrency and Computation: Practice and Experience, 33 (17). https://doi.org/10.1002/cpe.6304
Lemire, Daniel; Bartlett, Colin, & Kaser, Owen
(2021).
Integer Division by Constants: Optimal Bounds. Heliyon, 7 (6). https://doi.org/10.1016/j.heliyon.2021.e07442
Keiser, John, & Lemire, Daniel
(2021).
Validating UTF-8 In Less Than One Instruction Per Byte. Software: Practice and Experience, 51 (5), 950-964. https://doi.org/10.1002/spe.2920
Lemire, Daniel
(2021).
Number Parsing at a Gigabyte per Second. Software: Practice and Experience, 51 (8). https://doi.org/10.1002/spe.2984
Lewis, François; Plante, Patrick, & Lemire, Daniel
(2021).
Pertinence, efficacité et principes pédagogiques de la réalité virtuelle et augmentée en contexte scolaire : une revue de littérature. Médiations & médiatisations (5), 11-27.
Graf, Thomas Mueller, & Lemire, Daniel
(2020).
Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters. Journal of Experimental Algorithmics, 25 (1). https://doi.org/10.1145/3376122
Muła, Wojciech, & Lemire, Daniel
(2020).
Base64 encoding and decoding at almost the speed of a memory copy. Software: Practice and Experience, 50 (2), 89-97. https://doi.org/10.1002/spe.2777
Lemire, Daniel; Kaser, Owen, & Kurz, Nathan
(2019).
Faster Remainder by Direct Computation: Applications to Compilers and Software Libraries. Software: Practice and Experience, 49 (6), 953-970. https://doi.org/10.1002/spe.2689
Dai, Xueping; Cheng, Li Zhen; Mareschal, Jean-Claude; Lemire, Daniel, & Liu, Chong
(2019).
New method for denoising borehole transient electromagnetic data with discrete wavelet transform. Journal of Applied Geophysics, 168, 41-48. https://doi.org/10.1016/j.jappgeo.2019.05.009
Lemire, Daniel
(2019).
Fast Random Integer Generation in an Interval. ACM Transactions on Modeling and Computer Simulation, 29 (1). https://doi.org/10.1145/3230636
Lemire, Daniel, & O'Neill, Melissa
(2019).
Xorshift1024*, Xorshift1024+, Xorshift128+ and Xoroshiro128+ Fail Statistical Tests for Linearity. Computational and Applied Mathematics, 350, 139-142. https://doi.org/10.1016/j.cam.2018.10.019
Langdale, Geoff, & Lemire, Daniel
(2019).
Parsing Gigabytes of JSON per Second. VLDB Journal, 28 (6). https://doi.org/10.1007/s00778-019-00578-5
Muła, Wojciech, & Lemire, Daniel
(2018).
Faster Base64 Encoding and Decoding Using AVX2 Instructions. ACM Transactions on the Web, 12 (3). https://doi.org/10.1145/3132709
Li, Jing; Yan, Yuhong, & Lemire, Daniel
(2018).
Full Solution Indexing for top-K Web Service Composition. IEEE Transactions on Services Computing, 11 (3), 521 - 533. https://doi.org/10.1109/TSC.2016.2578924
Lemire, Daniel; Kaser, Owen; Kurz, Nathan; Deri, Luca; O'Hara, Chris; Saint-Jacques, François, & Ssi-Yan-Kai, Gregory
(2018).
Roaring Bitmaps: Implementation of an Optimized Software Library. Software: Practice and Experience, 48 (4), 867–895. https://doi.org/10.1002/spe.2560
Lemire, Daniel; Kurz, Nathan, & Rupp, Christoph
(2018).
Stream VByte: Faster byte-oriented integer compression. Information Processing Letters, 130. https://doi.org/10.1016/j.ipl.2017.09.011
Muła, Wojciech; Kurz, Nathan, & Lemire, Daniel
(2018).
Faster population counts using AVX2 instructions. Computer Journal, 61 (1). https://doi.org/10.1093/comjnl/bxx046
Badia, Antonio, & Lemire, Daniel
(2018).
On Desirable Semantics of Functional Dependencies over Databases with Incomplete Information. Fundamenta Informaticae, 158 (4), 327-352. https://doi.org/10.3233/FI-2018-1651
Ivanchykhin, Dmytro; Ignatchenko, Sergey, & Lemire, Daniel
(2017).
Regular and almost universal hashing: an efficient implementation. Software: Practice and Experience, 47 (10). https://doi.org/10.1002/spe.2461
Lemire, Daniel, & Rupp, Christoph
(2017).
Upscaledb: Efficient Integer-Key Compression in a Key-Value Store using SIMD Instructions. Information Systems, 66, 13–23. https://doi.org/10.1016/j.is.2017.01.002
Lemire, Daniel; Ssi-Yan-Kai, Gregory, & Kaser, Owen
(2016).
Consistently faster and smaller compressed bitmaps with Roaring. Software: Practice and Experience, 46 (11), 1547-1569. https://doi.org/10.1002/spe.2402
Chambi, Samy; Lemire, Daniel, & Godin, Robert
(2016).
Vers de meilleures performances avec des Roaring bitmaps. Technique et Science Informatiques, 35 (3), 335-355.
Lemire, Daniel, & Boytsov, Leonid
(2015).
Decoding billions of integers per second through vectorization. Software: Practice & Experience, 45 (1), 1-29. https://doi.org/10.1002/spe.2203
Lemire, Daniel, & Kaser, Owen
(2014).
Strongly universal string hashing is fast. Computer Journal, 57 (11), 1624-1638. https://doi.org/10.1093/comjnl/bxt070
Webb, Hazel; Lemire, Daniel, & Kaser, Owen
(2013).
Diamond dicing. Data & Knowledge Engineering, 86. https://doi.org/10.1016/j.datak.2013.01.001
Lemire, Daniel; Kaser, Owen, & Gutarra, Eduardo
(2012).
Reordering rows for better compression: Beyond the lexicographic order. ACM Transactions on Database Systems, 37 (3). https://doi.org/10.1145/2338626.2338627
Zhu, Xiaodan; Turney, Peter; Lemire, Daniel, & Vellino, Andre
(2015).
Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66 (2), 408-427. https://doi.org/10.1002/asi.23179
Badia, Antonio, & Lemire, Daniel
(2015).
Functional dependencies with null markers. Computer Journal, 58 (5), 1160-1168. https://doi.org/10.1093/comjnl/bxu039
Kaser, Owen, & Lemire, Daniel
(2006).
Attribute value reordering for efficient hybrid OLAP. Information Systems, 176 (16), 2304-2336. https://doi.org/10.1016/j.ins.2005.09.005
Lemire, Daniel
(2006).
Streaming maximum-minimum filter using no more than three comparisons per element. Nordic Journal of Computing, 13 (4), 328-339.
Lemire, Daniel
(2005).
Scale and translation invariant collaborative filtering systems. Information Retrieval, 8 (1), 129-150. https://doi.org/10.1023/B:INRT.0000048492.50961.a6
Lemire, Daniel; Boley, Harold; McGrath, Sean, & Ball, Marc
(2005).
Collaborative filtering and inference rules for context-aware learning object recommendation. Interactive Technology and Smart Education, 2 (3). https://doi.org/10.1108/17415650580000043
Dubuc, Serge; Lemire, Daniel, & Merrien, Jean-Louis
(2001).
Fourier analysis of 2-point Hermite interpolatory subdivision schemes. Journal of Fourier Analysis and Applications, 7 (5), 532-552. https://doi.org/10.1007/BF02511225
Lemire, Daniel, & Kaser, Owen
(2008).
Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays. ACM Transactions on Algorithms, 4 (1), 1-31. https://doi.org/10.1145/1328911.1328925
Lemire, Daniel; Brooks, Martin, & Yan, Yuhong
(2009).
An optimal linear time algorithm for quasi-monotonic segmentation. International Journal of Computer Mathematics, 86 (7). https://doi.org/10.1080/00207160701694153
Lemire, Daniel
(2009).
Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern Recognition, 42 (9). https://doi.org/10.1016/j.patcog.2008.11.030
Lemire, Daniel, & Kaser, Owen
(2010).
Recursive n-gram hashing is pairwise independent, at best. Computer Speech & Language, 24 (4), 698-710. https://doi.org/10.1016/j.csl.2009.12.001
Lemire, Daniel; Kaser, Owen, & Aouiche, Kamel
(2010).
Sorting improves word-aligned bitmap indexes. Data & Knowledge Engineering, 69 (1), 3-28. https://doi.org/10.1016/j.datak.2009.08.006
Badia, Antonio, & Lemire, Daniel
(2011).
A call to arms: Revisiting database design. SIGMOD Record, 40 (3), 61-69. https://doi.org/10.1145/2070736.2070750
Lemire, Daniel, & Kaser, Owen
(2011).
Reordering Columns for Smaller Indexes. Information Sciences, 181 (12), 2550–2570. https://doi.org/10.1016/j.ins.2011.02.002
Lemire, Daniel
(2012).
The universality of iterated hashing over variable-length strings. Discrete Applied Mathematic, 160 (4-5), 604–617. https://doi.org/10.1016/j.dam.2011.11.009
Neylon, Cameron; Aerts, Jan; Brown, C. Titus; Coles, Simon J.; Hatton, Les; Lemire, Daniel; Millman, K. Jarrod; Murray-Rust, Peter; Perez, Fernando; Saunders, Neil; Shah, Nigam; Smith, Arfon; Varoquaux, Gaël, & Willighagen, Egon
(2012).
Changing computational research. The challenges ahead. Source Code for Biology and Medicine, 7 (2). https://doi.org/10.1186/1751-0473-7-2
Prekopcsák, Zoltán, & Lemire, Daniel
(2012).
Time Series Classification by Class-Specific Mahalanobis Distance Measures. Advances in Data Analysis and Classification, 6 (3). https://doi.org/10.1007/s11634-012-0110-6
Kaser, Owen, & Lemire, Daniel
(2016).
Compressed bitmap indexes: beyond unions and intersections. Software: Practice and Experience, 46 (2). https://doi.org/10.1002/spe.2289
Crainiceanu, Adina, & Lemire, Daniel
(2015).
Bloofi : Multidimensional Bloom Filters. Information Systems, 54. https://doi.org/10.1016/j.is.2015.01.002
Zhao, Wayne Xin; Zhang, Xudong; Lemire, Daniel; Shan, Dongdong; Nie, Jian-Yun; Yan, Hongfei, & Wen, Ji-Rong
(2015).
A General SIMD-based Approach to Accelerating Compression Algorithms. ACM Transactions on Information Systems, 33 (3). https://doi.org/10.1145/2735629
Lemire, Daniel; Boytsov, Leonid, & Kurz, Nathan
(2016).
SIMD Compression and the Intersection of Sorted Integers. Software: Practice and Experience, 46 (6).
Chambi, Samy; Lemire, Daniel; Kaser, Owen, & Godin, Robert
(2016).
Better bitmap performance with Roaring bitmaps. Software: Practice and Experience, 45 (5), 709–719. https://doi.org/10.1002/spe.2325
Lemire, Daniel, & Kaser, Owen
(2016).
Faster 64-bit universal hashing using carry-less multiplications. Journal of Cryptographic Engineering, 6 (3), 171-185. https://doi.org/10.1007/s13389-015-0110-5
Journal articles (refereed)
Books
Godin, Robert, & Lemire, Daniel (2024). Programmation avec Python: des jeux au Web. . ISBN 979-8874122553
Lemire, Daniel; Mezghani, Neila; Boissières, Élodie; Godin, Robert; Louafi, Habib; Osei, Richmond; Shuraida, Shadi; Schmitt, Renée-Maria, & Vieru, Dragos (2024). La science des données: Théorie et applications avec R et Python. . ISBN 979-8-3257-7723-3
Godin, Robert, & Lemire, Daniel (2024). Java pas à pas: Introduction à la programmation et au langage Java. . ISBN 979-8-8728-5037-3
Book chapters
Aouiche, Kamel; Lemire, Daniel, & Godin, Robert (2009). Web 2.0 OLAP: From data cubes to tag clouds. In Web Information Systems and Technologies. 4th International Conference, WEBIST 2008, Funchal, Madeira, Portugal, May 4-7, 2008, Revised Selected Papers. Springer, coll. « Lecture Notes in Business Information Processing », vol. 18.
Noël, Sylvie, & Lemire, Daniel (2010). On the Challenges of Collaborative Data Processing. In Foster, Jonathan (Ed.), Collaborative Information Behaviour. User Engagement and Communication Sharing (p. 55-71). IGI Global : IGI Global.
Papers in conference proceedings (refereed)
Miladi, Fatma; Psyché, Valéry, & Lemire, Daniel (In Press). Comparative Performance of GPT-4, RAG-Augmented GPT-4, and Students in MOOCs. In Workshop on Breaking Barriers with Generative Intelligence (BBGI). Springer.
Miladi, Fatma; Psyché, Valéry, & Lemire, Daniel (In Press). Leveraging GPT-4 for Accuracy in Education: A Comparative Study on Retrieval-Augmented Generation in MOOCs. In AIED 2024 - 25th International Conference on Artificial Intelligence in Education (LBR Track). New York City : Springer-Verlag, coll. « Communications in Computer and Information Science ».
Miladi, Fatma; Lemire, Daniel, & Psyché, Valéry (In Press). Learning Engagement and Peer Learning in MOOC: A Selective Systematic Review. In 19th International Conference on Intelligent Tutoring Systems.
Begoli, Edmon; Camacho-Rodríguez, Jesús; Hyde, Julian; Mior, Michael, & Lemire, Daniel (2018). Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources. In Proceedings of the 2018 ACM International Conference on Management of Data (SIGMOD) (p. 221-230). https://doi.org/10.1145/3183713.3190662
Chambi, Samy; Lemire, Daniel, & Godin, Robert (2016). Nouveaux modèles d’index bitmap compressés à 64 bits. In Actes des 12es journées francophones sur les Entrepôts de Données et l'Analyse en Ligne.
Chambi, Samy; Lemire, Daniel; Godin, Robert; Boukhalfa, Kamel; Allen, Charles, & Yang, Fangjin (2016). Optimizing Druid with Roaring bitmaps. In Proceedings of the 20th International Database Engineering & Applications Symposium. ACM. ISBN 978-1-4503-4118-9 https://doi.org/10.1145/2938503.2938515
Li, Jing; Yan, Yuhong, & Lemire, Daniel (2016). Scaling up Web Service Composition with the Skyline Operator. In Proceedings of the IEEE International Conference on Web Services 2016.
Li, Jing; Yan, Yuhong, & Lemire, Daniel (2015). A web service composition method based on compact K2-trees. In Proceedings of the IEEE International Conference on Services Computing (p. 403 - 410). IEEE. ISBN 978-1-4673-7280-0 https://doi.org/10.1109/SCC.2015.62
Chambi, Samy; Lemire, Daniel, & Godin, Robert (2014). Roaring bitmap : nouveau modèle de compression bitmap. In Actes des 10e journées francophones sur les Entrepôts de Données et l'Analyse en Ligne.
Li, Jing; Yan, Yuhong, & Lemire, Daniel (2014). Full Solution Indexing Using Database for QoS-aware Web Service Composition. In Proceedings of the IEEE International Conference on Services Computing (p. 99 - 106). IEEE. ISBN 978-1-4799-5065-2 https://doi.org/10.1109/SCC.2014.22
Aouiche, Kamel; Lemire, Daniel, & Godin, Robert (2008). Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation. In Proceedings of WEBIST 2008. Portugal : Institute for Systems and Technologies of Information, Control and Communication.
Aouiche, Kamel, & Lemire, Daniel (2007). A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP. In Proceedings of the 10th International Workshop on Data Warehousing and OLAP. ACM.
Aouiche, Kamel, & Lemire, Daniel (2007). Unasssuming View-Size Estimation Techniques in OLAP. In Proceedings of the 9th International Conference on Enterprise Information Systems. Portugal : INSTICC.
Kaser, Owen, & Lemire, Daniel (2007). Removing Manually-Generated Boilerplate from Electronic Texts: Experiments with Project Gutenberg e-Books. In Spencer, Bruce; Story, Margaret-Ann, & Stewart, Darlene (Ed.), Proceedings of the 2007 Conference of the Center for Advanced Studies on Collaborative Research (CASCON '07). Riverton, NJ, É.-U. : IBM.
Kaser, Owen, & Lemire, Daniel (2007). Tag-Cloud Drawing: Algorithms for Cloud Visualization. In Proceedings of the Tagging and Metadata for Social Information Organization Workshop, 16th International World Wide Web Conference (WWW 2007). Banff, Canada : IW3C2.
Kucerovsky, Dan, & Lemire, Daniel (2007). Monotonicity Analysis over Chains and Curves. In Curve and surface fitting: Avignon 2006 (p. 180-190). Brentwood, TN, É.-U. : Nashboro Press.
Kaser, Owen; Lemire, Daniel, & Keith, Steven (2006). The LitOLAP Project: Data Warehousing with Literature. In Proceedings of the 2006 CaSTA Conference. University of New Brunswick.
Brooks, Martin; Yan, Yuhong, & Lemire, Daniel (2005). Scale-Based Monotonicity Analysis in Qualitative Modelling with Flat Segments. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence. Edinburgh, UK : IJICAI.
Lemire, Daniel (2005). A Better Alternative to Piecewise Linear Time Series Segmentation. In Apte, Chid; Skillicorn, David; Liu, Bing, & Parthasara, Srinivasan (Ed.), Proceedings of the 2007 SIAM International Conference on Data Mining (SDM'07) (p. 545-550). Minneapolis, Minnesota : SIAM. https://doi.org/10.1137/1.9781611972771.59
Lemire, Daniel; Brooks, Martin, & Yan, Yuhong (2005). An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation. In Han, Jiawei; Wah, Benjamin W.; Vijay, Raghavan; Wu, Xindong, & Rastogi, Rajeev (Ed.), Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM-05) (p. 709-712). Piscataway, NJ : IEEE. https://doi.org/10.1109/ICDM.2005.25
Lemire, Daniel, & Maclachlan, Anna (2005). Slope One Predictors for Online Rating-Based Collaborative Filtering. In Kargupta, Hillol; Srivastava, Jaideep; Kamath, Chandrika, & Goodman, Arnold (Ed.), Proceedings of the 2005 SIAM International Conference on Data Mining (SDM'05) (p. 471-475). Newport Beach, CA : SIAM.
Kaser, Owen, & Lemire, Daniel (2003). Attribute Value Reordering for Efficient Hybrid OLAP. In Rizzi, Stefano, & Song, Il-Yeol (Ed.), Proceedings of the ACM Sixth International Workshop on Data Warehousing and OLAP (p. 1-8). New Orleans, LA : ACM.
Lemire, Daniel (2003). A Family of 4-Point Dyadic Multistep Subdivision Schemes. In Cohen, Albert; Merrien, Jean-Louis, & Scumaker, Larry L. (Ed.), Curves and Surface Fitting: Saint-Malo 2002 (p. 259-268). Brentwood, TN, USA : Nashboro Press.
Lemire, Daniel (2002). Wavelet-Based Relative Prefix Sum Methods for Range Sum Queries in Data Cubes. In Stewart, Darlene A., & Johnson, J. Howard (Ed.), Proceedings of the 2002 Conference of the Center for Advanced Studies on Collaborative Research (CASCON '02) (p. 6). Riverton, NJ, USA : IBM.
Webb, Hazel; Kaser, Owen, & Lemire, Daniel (2008). Pruning Attributes From Data Cubes with Diamond Dicing. In IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications. ACM International Conference Proceeding Series.
Kaser, Owen; Lemire, Daniel, & Aouiche, Kamel (2008). Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes. In Proceedings of the 11th ACM International Workshop on Data Warehousing and OLAP. ACM.
Lemire, Daniel, & Vellino, Andre (2011). Extracting, Transforming and Archiving Scientific Data. In Proceedings of the Fourth Workshop on Very Large Digital Libraries. DELOS Association for Digital Libraries.
Ruer, Perrine; Gouin-Vallerand, Charles; Zhang, Le; Lemire, Daniel, & Vallières, Évelyne F. (2015). An analysis tool for the contextual information from field experiments on driving fatigue. In Proceeding of the Ninth International and Interdisciplinary Conference on Modeling and Using Context (Context 2015). Springer, coll. « LNAI ».
Anderson, Michelle; Ball, Marcel; Boley, Harold; Greene, Stephen; Howse, Nancy; Lemire, Daniel, & McGrath, Sean (2003). RACOFI: A Rule-Applying Collaborative Filtering System. In Proceedings of the IEEE/WIC COLA 2003.
Plaisance, Jeff; Kurz, Nathan, & Lemire, Daniel (2015). Vectorized VByte Decoding. In Proceedings of the First International Symposium on Web Algorithms.
Conference presentations (refereed)
Plante, Patrick; Desjardins, Guillaume; Dionne, Pierre-Olivier; Marineau, Sophie; Paré, Jean-François; Sauvé, Louise; Savard, Isabelle; Pinard, Anne-Marie; Lemire, Daniel, & Angulo Mendoza, Gustavo Adolfo (Oct 2019). Game Design Service Platform for Seniors' Health and Well-being. Poster presented at the AGE-WELL 2019 Annual Conference, Moncton, Canada.
Aouiche, Kamel; Lemire, Daniel, & Kaser, Owen (Jun 2008). Tri de la table de faits et compression des index bitmaps avec alignement sur les mots. Paper presented at the 24ièmes journées 'Bases de Données Avancées'.
Papers in conference proceedings (non refereed)
Lemire, Daniel (2021). Unicode at Gigabytes per Second. In Lecroq, Thierry, & Touzet, Hélène (Ed.), SPIRE 2021: String Processing and Information Retrieval. https://doi.org/10.1007/978-3-030-86692-1_2
Other non refereed contributions
Desjardins, Guillaume, & Plante, Patrick (2021). Guide des bonnes pratiques pour la conception de jeux sérieux et thérapeutiques destinés aux aînés (in collaboration with Marineau, Sophie; Angulo Mendoza, Gustavo Adolfo; Savard, Isabelle; Pinard, Anne Marie; Lemire, Daniel; Paré, Jean-François, & Pouliot, Sylvie) (Rapport de recherche). Québec, Canada : Observatoire du numérique en éducation.
Awards & Honors
Recognition
- Circle of Excellence, Université du Québec (2024) with the Team Robot
- Award of Excellence for Achievement in Research (all fields), Université du Québec (2020)
- Circle of Excellence, Université du Québec (2019)
Teaching
- Sherpa Award (2023) for my dedication to the students
Industry prizes
- Google Open Source Peer Bonus Program (2012)
Paper awards
- Best student paper award (IEEE SCC 2014)
- Best paper award (CASCON 2002)
Community Service
PUBLIC appearances
- Parsing numbers at a gigabyte per second (MIT Fast Code Seminar 2021)
- Floating-point Number Parsing w/Perfect Accuracy at GB/sec (Go Systems Conf SF 2020)
- Data Engineering at the Speed of Your Disk (Performance Summit 3, Facebook, 2020)
- Parsing JSON Really Quickly: Lessons Learned (QCon 2019, San Francisco)
- Next Generation Indexes For Big Data Engineering (ODSC 2018, Boston)
- Engineering Fast Indexes for Big Data Applications (Spark Summit East 2017, Boston)
- Engineering Fast Indexes for Big Data Applications (deep dive) (Spark Summit East 2017, Boston)
- Algorithms: How content finds ‘you’ panel at the Discoverability Summit (CRTC, Toronto, 2016)
- Pour la perennité de nos contenus nationaux : l'enjeu de la visibilité panel at the « rencontres de l'ADISQ » (Montreal, 2016)
Program committee (international conferences)
- ACM Conference on Information and Knowledge Management (ACM CIKM)
- ACM Conference on Web Search and Data Mining (ACM WSDM)
- ACM Conference on Information Retrieval (ACM SIGIR)
- ACM Conference on Recommender Systems (ACM RecSys)
- ACM/IEEE Joint Conference on Digital Libraries (JCDL)
Funding bodies
- FRQNT: review committee 03F (theoretical computer science) since 2007.
- FRQNT: review committee 309 (team projects in computer science) since 2006.
- NSERC: Research Tools and Instruments Grants Program (2012-2015)
- NSERC: Computer Science Evaluation Group (EG 1507) for the Discovery Grants Program (2018-2021), committee co-chair in 2019-2020 and 2020-2021
external referee (Ph.D.)
- Luca Versari of Pisa University, Italy (2021) - supervised by Roberto Grossi.
- Kareem El Gebaly at Waterloo University (2018) - supervised by Jimmy Lin, Lukasz Golab and Ashraf Aboulnaga.
- Mohammed Shaaban at Université Pierre et Marie Curie (2017) - supervised by Patrick Garda.
- Mehdi Boukhechba at UQAC (2016) - supervised by Abdenour Bouzouane and Charles Gouin-Vallerand.
- Hicham Assoudi at UQAM (2016) - supervised by Hakim Lounis.
- Khaled Dehdouh at Lyon 2 (2015) - supervised by Omar Boussaid.
- Martin Leginus at Aalborg University (2015) - supervised by Peter Dolog.
- Ahmad Taleb at Université Concordia (2011) - supervised by Todd Eavis.
EXTERNAL REFEREE (Promotion)
- Sabine Loudcher Rabaseda at Université Lyon2 - habilitation.
- Jason Sawin at Université of St. Thomas.
- Amer Nizar AbuAli at Philadelphia University.
- Jinan Fiaidhi at Lakehead University.
JOURNAL
- Editor, Software: Practice and Experience (2021-...).
- Distinguished Referee, Software: Practice and Experience, 2018.
- Associate editor, Heliyon Computer Science (2015-2023).