Joan Giner-Miguelez

I’m a researcher specializing in data-sharing techniques for machine learning, with a focus on the human aspects of software engineering. I consider software and data to be sociotechnical assets, and I am committed to bridging the gap between them and society. I’m currently involved in Croissant, a metadata initiative for AI-ready datasets, where I’m co-leading the Responsible AI group. Additionally, I’m currently developing machine learning techniques that benefit the greater good, alongside humanities and social scientists at the Barcelona Supercomputing Center. Furthermore, I have taught, created course content, and coordinated practicums for web and software development courses at UAB and UOC universities. Previously, I worked as a software architect for international companies, and I was elected to the Catalan Parliament (2015-2018), serving as a spokesman in its Science and Technology Committee.

Publications:

The Software Diversity Card: A Framework for Reporting Diversity in Software Projects. Information and Software Technology, Elsevier, 2025.
On the Readiness of Scientific Data for a Fair and Transparent Use in Machine Learning . Scientific Data. Nature. January 2025.
A Standardized Machine-readable Dataset Documentation Format for Responsible AI. ArXiv preprint, 2025.
Croissant: A Metadata Format for ML-Ready Datasets. NeurIPS 2024.
Model Driven Engineering, Artificial Intelligence, and DevOps for Software and Systems Engineering: A Systematic Mapping Study of Synergies and Challenges. ACM Transactions on Software Engineering and Methodology (TOSEM), 2025.
Using Large Language Models to Enrich the Documentation of Datasets for Machine Learning. April 2024, arXiv preprint
DataDoc Analyzer, a tool for analyzing the documentation of scientific datasets. At 32nd ACM International Conference on Information and Knowledge Management (CIKM), October 2023.
A domain-specific language for describing machine learning datasets. Journal of Computer Languages, April 2023.
DescribeML: A dataset description tool for machine learning. Science of Computer Programming, November 2022,
Enabling Content Management Systems as an Information Source in Model-Driven Projects. In International Conference on Research Challenges in Information Science (RCIS) January 2022,

Joan Giner-Miguelez

Researcher at Barcelona Supercomputing Center (BSC)

Publications: