Description
Provides an important framework for data analysts in assessing the quality of data and its potential to provide meaningful insights through analysis
Analytics and statistical analysis have become pervasive topics, mainly due to the growing availability of data and analytic tools. Technology, however, fails to deliver insights with added value if the quality of the information it generates is not assured. Information Quality (InfoQ) is a tool developed by the authors to assess the potential of a dataset to achieve a goal of interest, using data analysis. Whether the information quality of a dataset is sufficient is of practical importance at many stages of the data analytics journey, from the pre-data collection stage to the post-data collection and post-analysis stages. It is also critical to various stakeholders: data collection agencies, analysts, data scientists, and management.
This book:
- Explains how to integrate the notions of goal, data, analysis and utility that are the main building blocks of data analysis within any domain.
- Presents a framework for integrating domain knowledge with data analysis.
- Provides a combination of both methodological and practical aspects of data analysis.
- Discusses issues surrounding the implementation and integration of InfoQ in both academic programmes and business / industrial projects.
- Showcases numerous case studies in a variety of application areas such as education, healthcare, official statistics, risk management and marketing surveys.
- Presents a review of software tools from the InfoQ perspective along with example datasets on an accompanying website.
This book will be beneficial for researchers in academia and in industry, analysts, consultants, and agencies that collect and analyse data as well as undergraduate and postgraduate courses involving data analysis.
Chapter
1.5 InfoQ and study quality
Chapter 2 Quality of goal, data quality, and analysis quality
Chapter 3 Dimensions of information quality and InfoQ assessment
3.2 The eight dimensions of InfoQ
3.4 Example: InfoQ assessment of online auction experimental data
Chapter 4 InfoQ at the study design stage
4.2 Primary versus secondary data and experiments versus observational data
4.3 Statistical design of experiments
4.4 Clinical trials and experiments with human subjects
4.5 Design of observational studies: Survey sampling
4.6 Computer experiments (simulations)
4.7 Multiobjective studies
Chapter 5 InfoQ at the postdata collection stage
5.2 Postdata collection data
5.3 Data cleaning and preprocessing
5.4 Reweighting and bias adjustment
5.6 Retrospective experimental design analysis
5.7 Models that account for data “loss”: Censoring and truncation
Part II Applications of InfoQ
6.2 Test scores in schools
6.3 Value-added models for educational assessment
6.4 Assessing understanding of concepts
Appendix: MERLO implementation for an introduction to statistics course
Chapter 7 Customer surveys
7.2 Design of customer surveys
7.4 Models for customer survey data analysis
Appendix: A posteriori InfoQ improvement for survey nonresponse selection bias
8.2 Institute of medicine reports
8.3 Sant’Anna di Pisa report on the Tuscany healthcare system
8.4 The haemodialysis case study
8.5 The Geriatric Medical Center case study
8.6 Report of cancer incidence cluster
Chapter 9 Risk management
9.2 Financial engineering, risk management, and Taleb’s quadrant
9.3 Risk management of OSS
9.4 Risk management of a telecommunication system supplier
9.5 Risk management in enterprise system implementation
Chapter 10 Official statistics
10.2 Information quality and official statistics
10.3 Quality standards for official statistics
10.4 Standards for customer surveys
10.5 Integrating official statistics with administrative data for enhanced InfoQ
Part III Implementing InfoQ
Chapter 11 InfoQ and reproducible research
11.2 Definitions of reproducibility, repeatability, and replicability
11.3 Reproducibility and repeatability in GR&&R
11.4 Reproducibility and repeatability in animal behavior studies
11.5 Replicability in genome‐wide association studies
11.6 Reproducibility, repeatability, and replicability: the InfoQ lens
Appendix: Gauge repeatability and reproducibility study design and analysis
Chapter 12 InfoQ in review processes of scientific publications
12.2 Current guidelines in applied journals
12.3 InfoQ guidelines for reviewers
Chapter 13 Integrating InfoQ into data science analytics programs, research methods courses, and more
13.2 Experience from InfoQ integrations in existing courses
13.3 InfoQ as an integrating theme in analytics programs
13.4 Designing a new analytics course (or redesigning an existing course)
13.5 A one-day InfoQ workshop
Chapter 14 InfoQ support with R
14.2 Examples of information quality with R
14.3 Components and dimensions of InfoQ and R
Chapter 15 InfoQ support with Minitab
15.2 Components and dimensions of InfoQ and Minitab
15.3 Examples of InfoQ with Minitab
Chapter 16 InfoQ support with JMP
16.2 Example 1: Controlling a film deposition process
16.3 Example 2: Predicting water quality in the Savannah River Basin
16.4 A JMP application to score the InfoQ dimensions
16.5 JMP capabilities and InfoQ