Note: I'm not available to work on new missions until the 1st of September 2023. Thank you for your understanding.
When I worked as a teacher, I told my students that Artificial Intelligence and Machine Learning are the most effective levers to make a difference. Every day, new AI and ML solutions are released to empower companies and individuals alike. The question is: Is your business ready to provide the best AI/ML products for your customers?
I'm a professional Machine Learning Engineer, Data Scientist, and MLOps ready to assist you in this quest. I've completed a Ph.D. in Machine Learning and several high-end AI/ML certifications to help you build leading data-driven services. My past experiences include working with companies like Google, BNP Paribas, ArcelorMittal, the European Commission, and Decathlon to frame their needs, create state-of-the-art models and deliver AI/ML artifacts at scale.
I now work as a freelancer in Luxembourg, and I can carry out missions remotely in other European countries. You can get in touch with me on LinkedIn or at contact@fmind.dev. I'll be happy to collaborate with you or discuss your favored AI/ML topics in the MLOps Community.
+ Tutoring adult students to become data scientists specializing in machine learning. - https://openclassrooms.com/fr/paths/793-data-scientist - https://openclassrooms.com/fr/paths/794-machine-learning-engineer - https://openclassrooms.com/fr/paths/795-ai-engineer
+
+ Mission: Enhance the ARACHNE risk scoring tool (fraud detection).
Main tasks and responsibilities: - Develop a new version of Arachne using data mining techniques - Manage the development of the Arachne PoC/Project (SCRUM) - Assist data scientists in their projects (Virtual Assistant, NLP, …)
Technical stack: - Data Science: Python, PostgreSQL, SQLAlchemy, Hugging Face, HayStack - Management/Environment: Jira, Confluence, MS Office, AWS, Azure
+
+ Mission: Design and implement the next ML/MLOps platform on AWS and GCP.
Main tasks and responsibilities: - Design the functional & technical architecture of the platform - Manage the MLOps@Decathlon initiative (tasks, plannings) - Select the vendor solutions based on a user need analysis - Communicate the progress and success to stack-holders - Assist data scientists in their project (audience, forecast)
+ Mission: Design and implement the next ML/MLOps platform on AWS and GCP.
Main tasks and responsibilities: - Design the functional & technical architecture of the platform - Manage the MLOps@Decathlon initiative (tasks, plannings) - Select the vendor solutions based on a user need analysis - Communicate the progress and success to stack-holders - Assist data scientists in their project (audience, forecast)
+ Mission: Improve the visibility and assets of SFEIR's Data Team.
Main tasks and responsibilities: - Design and create technical interviews for recruiting data scientists. - Become a Professional Machine Learning Engineer on Google Cloud. - Propose a strategy to improve the online visibility of SFEIR data team. - Share knowledge about data trends with non-technical staff members. - Create a group to write tutorials and kata on AI/ML for SFEIR developers.
+
+ Mission: Train and optimize machine learning models to recommend steel prices.
Main tasks and responsibilities: - Create and fine-tune machine-learning models (tree-based) - Evaluate the performance of the model on real datasets - Communicate the results to business stack-holders
+ While the future of machine learning and MLOps is being debated, practitioners still need to attend to their machine learning models in production. This is no easy task, as ML engineers must constantly assess the quality of the data that enters and exits their pipelines, and ensure that their models generate the correct predictions. To assist ML engineers with this challenge, several AI/ML monitoring solutions have been developed.
In this article, I will discuss the nature of AI/ML…
+
+
+
+
+
+
+
+
+
+
+ While the future of machine learning and MLOps is being debated, practitioners still need to attend to their machine learning models in production. This is no easy task, as ML engineers must constantly assess the quality of the data that enters and exits their pipelines, and ensure that their models generate the correct predictions. To assist ML engineers with this challenge, several AI/ML monitoring solutions have been developed.
In this article, I will discuss the nature of AI/ML monitoring and how it relates to data engineering. First, I will present the similarities between AI/ML monitoring and data engineering. Second, I will enumerate additional features that AI/ML monitoring solutions can provide. Third, I will briefly touch on the topic of AI/ML observability and its relation to AI/ML monitoring. Finally, I will provide my conclusion about the field of AI/ML monitoring and how it should be considered to ensure the success of your AI/ML project.
+
+
+
+
+
+
+
+
+ In this article, I present the implementation of a Python package on GitHub designed to support MLOps initiatives. The goal of this package is to make the coding workflow of data scientists and ML engineers as flexible, robust, and productive as possible. First, I start by motivating the use of Python packages. Then, I provide some tools and tips you can include in your MLOps project. Finally, I explain the follow-up steps required to take this package to the next level and make it work in your…
+
+
+
+
+
+
+
+
+
+
+ In this article, I present the implementation of a Python package on GitHub designed to support MLOps initiatives. The goal of this package is to make the coding workflow of data scientists and ML engineers as flexible, robust, and productive as possible. First, I start by motivating the use of Python packages. Then, I provide some tools and tips you can include in your MLOps project. Finally, I explain the follow-up steps required to take this package to the next level and make it work in your environment.
+
+
+
+
+
+
+
+
+ Large Language Model (LLM) is such an existing topic. Since the release of ChatGPT, we saw a surge of innovation ranging from education mentorship to finance advisory. Each week is a new opportunity for addressing new kinds of problems, increasing human productivity, or improving existing solutions. Yet, we may wonder if this is just a new hype cycle or if organizations are truly adopting LLMs at scale …
On March 2023, the MLOps Community issued a survey about LLMs in production to…
+
+
+
+
+
+
+
+
+
+
+ Large Language Model (LLM) is such an existing topic. Since the release of ChatGPT, we saw a surge of innovation ranging from education mentorship to finance advisory. Each week is a new opportunity for addressing new kinds of problems, increasing human productivity, or improving existing solutions. Yet, we may wonder if this is just a new hype cycle or if organizations are truly adopting LLMs at scale …
On March 2023, the MLOps Community issued a survey about LLMs in production to picture the state of adoption. The survey is full of interesting insights, but there is a catch: 80% of the questions are open-ended, which means respondents answered the survey freely from a few keywords to full sentences. I volunteered to clean up the answers with the help of ChatGPT and let the community get a grasp of the survey experiences.
In this article, I present the steps and lessons learned from my journey to shed some light on the MLOps survey on LLMs. I’m first going to present the goal and questions of the survey. Then, I will explain how I used ChatGPT to review the data and standardize the content. Finally, I’m going to evaluate the performance of ChatGPT compared to a manual review.
+
+
+
+
+
+
+
+
+ If you work on MLOps, you must navigate an ever-growing landscape of tools and solutions. This is both an intense source of stimulation and fatigue for MLOps practitioners.
Vendors and users face the same problem: How can we combine all these tools without the combinatorial complexity of creating custom integrations?
In this article, I propose a solution analogous to POSIX to address this challenge. First, I motivate the creation of common protocols and schemas for combining MLOps…
+
+
+
+
+
+
+
+
+
+
+ If you work on MLOps, you must navigate an ever-growing landscape of tools and solutions. This is both an intense source of stimulation and fatigue for MLOps practitioners.
Vendors and users face the same problem: How can we combine all these tools without the combinatorial complexity of creating custom integrations?
In this article, I propose a solution analogous to POSIX to address this challenge. First, I motivate the creation of common protocols and schemas for combining MLOps tools. Second, I present a high-level architecture to support implementation. Third, I conclude with the benefits and limitations of standardizing MLOps.
+
+
+
+
+
+
+
+
+ Kubeflow Pipelines (KFP) is a powerful platform for building machine learning pipelines at scale with Kubernetes. The platform is well supported on major cloud platforms such as GCP (Vertex AI Pipelines) or AWS (Kubeflow on AWS). However, installing KFP on Apple Silicon (macOS 12.5.1 with Apple M1 Pro) proved to be more challenging than I imagined. Thus, I wanted to share my experience and tips to install KFP as easily as possible on your shiny Mac.
In this article, I present 4 steps to…
+
+
+
+
+
+
+
+
+
+
+ Kubeflow Pipelines (KFP) is a powerful platform for building machine learning pipelines at scale with Kubernetes. The platform is well supported on major cloud platforms such as GCP (Vertex AI Pipelines) or AWS (Kubeflow on AWS). However, installing KFP on Apple Silicon (macOS 12.5.1 with Apple M1 Pro) proved to be more challenging than I imagined. Thus, I wanted to share my experience and tips to install KFP as easily as possible on your shiny Mac.
In this article, I present 4 steps to install Kubeflow on Apple Silicon, using Rancher Desktop for setting up Docker/Kubernetes. In the end, I list the problems I encountered during the installation of Kubeflow Pipelines.
+
+
+
+
+
+
+
+
+ As programmers, we are continuously looking for languages that are performant, productive, and general purpose. Is there any programming language that currently satisfies these properties? Can we ever create one?
In this article, I present a fundamental trade-off that affects the design of programming languages and the success of software projects.
+
+
+
+ University of Luxembourg
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Mobile applications are essential for interacting with technology and other people. With more than 2 billion devices deployed all over the world, Android offers a thriving ecosystem by making accessible the work of thousands of developers on digital marketplaces such as Google Play. Nevertheless, the success of Android also exposes millions of users to malware authors who seek to siphon private information and hijack mobile devices for their benefits.
To fight against the proliferation…
+
+
+
+
+
+
+
+
+
+
+ Mobile applications are essential for interacting with technology and other people. With more than 2 billion devices deployed all over the world, Android offers a thriving ecosystem by making accessible the work of thousands of developers on digital marketplaces such as Google Play. Nevertheless, the success of Android also exposes millions of users to malware authors who seek to siphon private information and hijack mobile devices for their benefits.
To fight against the proliferation of Android malware, the security community embraced machine learning, a branch of artificial intelligence that powers a new generation of detection systems. Machine learning algorithms, however, require a substantial number of qualified samples to learn the classification rules enforced by security experts. Unfortunately, malware ground truths are notoriously hard to construct due to the inherent complexity of Android applications and the global lack of public information about malware. In a context where both information and human resources are limited, the security community is in demand for new approaches to aid practitioners to accurately define Android malware, automate classification decisions, and improve the comprehension of Android malware.
This dissertation proposes three solutions to assist with the creation of malware ground truths.
+
+
+
+
+
+
+
+
+ Android malware is now pervasive and evolving rapidly. Thousands of malware samples are discovered every day with new models of attacks. The growth of these threats has come hand in hand with the proliferation of collective repositories sharing the latest specimens. Having access to a large number of samples opens new research directions aiming at efficiently vetting apps. However, automatically inferring a reference ground-truth from those repositories is not straightforward and can…
+
+
+
+
+
+
+
+
+
+
+ Android malware is now pervasive and evolving rapidly. Thousands of malware samples are discovered every day with new models of attacks. The growth of these threats has come hand in hand with the proliferation of collective repositories sharing the latest specimens. Having access to a large number of samples opens new research directions aiming at efficiently vetting apps. However, automatically inferring a reference ground-truth from those repositories is not straightforward and can inadvertently lead to unforeseen misconceptions. On the one hand, samples are often mislabeled as different parties use distinct naming schemes for the same sample. On the other hand, samples are frequently misclassified due to conceptual errors made during labeling processes.
In this paper, we analyze the associations between all labels given by different vendors and we propose a system called EUPHONY to systematically unify common samples into family groups. The key novelty of our approach is that no prior knowledge of malware families is needed. We evaluate our approach using reference datasets and more than 0.4 million additional samples outside of these datasets. Results show that EUPHONY provides competitive performance against the state-of-the-art.
+
+
+
+
+
+
+
+
+ There is generally a lack of consensus in Antivirus (AV) engines' decisions on a given sample. This challenges the building of authoritative ground-truth datasets. Instead, researchers and practitioners may rely on unvalidated approaches to build their ground truth, e.g., by considering decisions from a selected set of Antivirus vendors or by setting up a threshold number of positive detections before classifying a sample. Both approaches are biased as they implicitly either decide on ranking…
+
+
+
+
+
+
+
+
+
+
+ There is generally a lack of consensus in Antivirus (AV) engines' decisions on a given sample. This challenges the building of authoritative ground-truth datasets. Instead, researchers and practitioners may rely on unvalidated approaches to build their ground truth, e.g., by considering decisions from a selected set of Antivirus vendors or by setting up a threshold number of positive detections before classifying a sample. Both approaches are biased as they implicitly either decide on ranking AV products, or they consider that all AV decisions have equal weights. In this paper, we extensively investigate the lack of agreement among AV engines.
To that end, we propose a set of metrics that quantitatively describe the different dimensions of this lack of consensus. We show how our metrics can bring important insights by using the detection results of 66 AV products on 2 million Android apps as a case study. Our analysis focuses not only on AV binary decision but also on the notoriously hard problem of labels that AVs associate with suspicious files, and allows to highlight biases hidden in the collection of a malware ground truth---a foundation stone of any machine learning-based malware detection approach.
+
+
+
+
+
+
+
+