Skip to content

Libera Università di Bolzano

Big data methods for economics and business

Semestre 1-2 · 27512 · Corso di laurea magistrale in Data Analytics for Economics and Management · 12CFU · EN


Module 1 focuses on advanced statistical techniques for analyzing high-dimensional datasets frequently encountered in business intelligence and economic research. Key topics include penalized and convex optimization methods for model selection (such as LASSO), model aggregation techniques, dimension reduction, high-dimensional regression models, and network-based inference using graphical models. The module also introduces multiple testing procedures for identifying significant patterns across many variables. Emphasis is placed on practical implementation using R and Python, and on the ability to apply these tools to extract interpretable, actionable insights from large-scale data in business and economic applications.

Module 2 provides an in-depth introduction to Natural Language Processing (NLP) with a strong focus on modern applications in business and economics. Core topics include algorithmic text classification, sentiment analysis, neural language modeling, and advanced information retrieval using vector-based and neural approaches. Students will learn techniques for web scraping, prompt engineering, and the use of Retrieval-Augmented Generation (RAG) systems, which combine document retrieval with generative models to improve accuracy and relevance. The module also explores recent developments in large language model (LLM) applications, including multi-agent systems and conversational AI, equipping students to critically evaluate and implement state-of-the-art NLP solutions.

Docenti: Davide Ferrari, Paul Michael Pronobis

Ore didattica frontale: M1: - 24 hours of in-person lectures - 12 hours of video lectures (counted as 24 hours to account for re-watching) M2: - 24 hours of in-person lectures - 12 hours of video lectures (counted as 24 hours to account for re-watching)
Ore di laboratorio: -
Obbligo di frequenza: Recommended, but not required.

Argomenti dell'insegnamento
M1: • High-dimensional data, big data and the curse of dimensionality • Convex criterions for model selection • Model aggregation and model combining • Introduction to data dimension reduction • High-dimensional regression • Graphical models • Multiple testing M2: 1. Introduction to Natural Language Processing (NLP): Exploring the fundamentals of NLP, including its history, applications, and difference to other neural networks. 2. Algorithmic Text Classification and Sentiment Analysis: Detailed instruction on various algorithms for categorizing text and extracting sentiment, comparing their effectiveness and use cases. 3. Neural Networks in NLP and Language Modeling: An in-depth look at how neural networks are applied in NLP, focusing on using and evaluating different NLP models. 4. Advanced Techniques in Information Retrieval: Utilization of cutting-edge neural network strategies combined with vector space models to efficiently retrieve information. 5. Web Scraping for Knowledge Construction: Techniques for extracting information from the web to build databases for applications that demand current or extensive factual data. 6. Prompt Engineering for Enhanced Language Understanding: Crafting effective prompts to improve relation extraction, answer questions accurately, support dialog systems, and create responsive chatbots. 7. Fine-Tuning: Introducing key steps for adapting pre-trained language models (CLM and MLM) through preprocessing and model training. Also covers performance evaluation using tools like Wandb, enabling effective monitoring and optimization for various NLP tasks. 8. Innovations in Large Language Model (LLM) Applications: Exploring multi-agent conversations and the latest advancements in LLM applications, pushing the boundaries of interactive AI systems.

Modalità di insegnamento
The course adopts a blended, student-centred approach that emphasises problem-based learning and active engagement. A portion of the lecture content is made available online in advance, allowing students to explore key concepts independently and at their own pace before attending class. This preparatory work enables in-person sessions to focus on the application of knowledge through real-world problems, collaborative activities, and guided discussions — fostering critical thinking and deeper learning. The course is fully aligned with the principles of the Italian Universities Digital Hub (EDUNEXT) initiative (https://edunext.eu), which promotes the integration of digital resources and active learning strategies within university teaching.

Modalità d'esame
The overall exam mark will be determined by the assessment of the two modules (M1+M2). M1: Final Exam (60%): The final exam consists of problems related to the use of statistical methods and interpretation of results obtained from the analysis and interpretations of various data sets. Assignments (40%): Data analysis assignments to be handed in will be assigned three times during the semester. M2: Final Exam (60%): The final exam consists of problems related to the use of statistical methods and interpretation of results obtained from the analysis and interpretations of various data sets. Assignments (40%): Data analysis assignments to be handed in.

Criteri di valutazione
In both modules the exam modalities are the same for both the attending and the non-attending students. Project work (40% of the final grade) and written exam (60% of the final grade). • Relevant for project work: clarity of presentation, ability to gain useful and novel insights from data, creativity, critical thinking, ability to adhere to reproducible research best practices • Ability to use R and other software to perform basic data preparation tasks, ability to properly use R libraries, ability to choose the best type of graphical representation for different types of data, correct usage of basic statistical tools Ability to use Python to employ (understand, recall and use) data analytics methods in practical settings in relation to data analysis and visualization.

Bibliografia obbligatoria

M1:

Lederer, J. (2022). Fundamentals of high-dimensional statistics. Springer International Publishing.

M2:

Tunstall, L., Von Werra, L., & Wolf, T. (2022). Natural language processing with transformers. " O'Reilly Media, Inc.




Scarica come PDF

Modules

Semestre 1 · 27512A · Corso di laurea magistrale in Data Analytics for Economics and Management · 6CFU · EN

Module A — M1 - Statistical methods for high-dimensional data

This module focuses on advanced statistical techniques for analyzing high-dimensional datasets frequently encountered in business intelligence and economic research. Key topics include penalized and convex optimization methods for model selection (such as LASSO), model aggregation techniques, dimension reduction, high-dimensional regression models, and network-based inference using graphical models. The module also introduces multiple testing procedures for identifying significant patterns across many variables. Emphasis is placed on practical implementation using R and Python, and on the ability to apply these tools to extract interpretable, actionable insights from large-scale data in business and economic applications.

Docenti: Davide Ferrari

Ore didattica frontale: - 24 hours of in-person lectures - 12 hours of video lectures (counted as 24 hours to account for re-watching)
Ore di laboratorio: -

Argomenti dell'insegnamento
• High-dimensional data, big data and the curse of dimensionality • Convex criterions for model selection • Model aggregation and model combining • Introduction to data dimension reduction • High-dimensional regression • Graphical models • Multiple testing

Modalità di insegnamento
This module adopts a blended, student-centred approach that emphasises problem-based learning and active engagement. A portion of the lecture content is made available online in advance, allowing students to explore key concepts independently and at their own pace before attending class. This preparatory work enables in-person sessions to focus on the application of knowledge through real-world problems, collaborative activities, and guided discussions — fostering critical thinking and deeper learning. The course is fully aligned with the principles of the Italian Universities Digital Hub (EDUNEXT) initiative (https://edunext.eu), which promotes the integration of digital resources and active learning strategies within university teaching.

Bibliografia obbligatoria

Lederer, J. (2022). Fundamentals of high-dimensional statistics. Springer International Publishing.



Semestre 2 · 27512B · Corso di laurea magistrale in Data Analytics for Economics and Management · 6CFU · EN

Module B — M2 - Natural language processing and web analytics

This module provides an in-depth introduction to Natural Language Processing (NLP) with a strong focus on modern applications in business and economics. Core topics include algorithmic text classification, sentiment analysis, neural language modeling, and advanced information retrieval using vector-based and neural approaches. Students will learn techniques for web scraping, prompt engineering, and the use of Retrieval-Augmented Generation (RAG) systems, which combine document retrieval with generative models to improve accuracy and relevance. The module also explores recent developments in large language model (LLM) applications, including multi-agent systems and conversational AI, equipping students to critically evaluate and implement state-of-the-art NLP solutions.

Docenti: Paul Michael Pronobis

Ore didattica frontale: - 24 hours of in-person lectures - 12 hours of video lectures (counted as 24 hours to account for re-watching)
Ore di laboratorio: -

Richiesta info