I’m a researcher specializing in data-sharing techniques for machine learning, with a focus on the human aspects of software engineering. I consider software and data to be sociotechnical assets, and I am committed to bridging the gap between them and society. I’m currently involved in Croissant, a metadata initiative for AI-ready datasets, where I’m co-leading the Responsible AI group. Additionally, I’m currently developing machine learning techniques that benefit the greater good, alongside humanities and social scientists at the Barcelona Supercomputing Center. Furthermore, I have taught, created course content, and coordinated practicums for web and software development courses at UAB and UOC universities. Previously, I worked as a software architect for international companies, and I was elected to the Catalan Parliament (2015-2018), serving as a spokesman in its Science and Technology Committee.
Publications:
- The Software Diversity Card: A Framework for Reporting Diversity in Software Projects. Information and Software Technology, Elsevier, 2025.
- On the Readiness of Scientific Data for a Fair and Transparent Use in Machine Learning . Scientific Data. Nature. January 2025.
- A Standardized Machine-readable Dataset Documentation Format for Responsible AI. ArXiv preprint, 2025.
- Croissant: A Metadata Format for ML-Ready Datasets. NeurIPS 2024.
- Model Driven Engineering, Artificial Intelligence, and DevOps for Software and Systems Engineering: A Systematic Mapping Study of Synergies and Challenges. ACM Transactions on Software Engineering and Methodology (TOSEM), 2025.
- Using Large Language Models to Enrich the Documentation of Datasets for Machine Learning. April 2024, arXiv preprint
- DataDoc Analyzer, a tool for analyzing the documentation of scientific datasets. At 32nd ACM International Conference on Information and Knowledge Management (CIKM), October 2023.
- A domain-specific language for describing machine learning datasets. Journal of Computer Languages, April 2023.
- DescribeML: A dataset description tool for machine learning. Science of Computer Programming, November 2022,
- Enabling Content Management Systems as an Information Source in Model-Driven Projects. In International Conference on Research Challenges in Information Science (RCIS) January 2022,