The aim of BERD(at)BW is to establish a competence center for data availability, data exchange and data analysis. The project also includes the development of training and further education. These courses will use concrete case studies to teach skills that researchers need for data-based work. Thanks to its extensive experience with the International Program in Survey and Data Science and the Coleridge Initiative in the USA, the chair can provide excellent support in course development.
The project „Fairness in Automated Decision-Making (Fair ADM)“ by Prof. Dr. Frauke Kreuter, Dr. Ruben Bach and Dr. Christoph Kern from the Chair of Statistics and Social Science Methodology, deals with discrimination and fairness of algorithm-based decision-making processes (Automated Decision-Making, ADM) in the German public sector. „While ADM systems optimize bureaucratic procedures through automation, their use also raises new social and ethical questions,“ says Prof. Dr. Frauke Kreuter. It is feared that ADM could increase existing social discrimination. For example, ADM systems are already being used in the U.S. to assess the risk of recidivism of defendants in the context of legal proceedings. A particularly sensitive field of application of ADM in the European context is the assessment of job seekers' chances on the labour market, e.g. for the allocation of training resources, which has recently been proposed by the Austrian Public Employment Service (AMS). There is a risk that sensitive characteristics such as gender, age or marital status are brought into the algorithmic decision-making process and thus influence the distribution of resources. In order to shed more light on this and to empirically investigate methods to correct unfair algorithms, the project develops and evaluates an ADM based on administrative labour market data. This research is supported with 171.000 Euro.
Decisions about confidentiality protection measures to be applied to data dissemination must be informed by evidence about the utility associated with the quality of the data and the willingness to trade utility against the estimated risk. Doing so requires measurement of data utility, risk, and the willingness of individuals to trade risk for utility. From the theoretical literature on measuring privacy (Nissenbaum 2011) and trust (Bauer and Freitag 2018), perceptions of trust and privacy are context dependent. There are three dimensions that are particular important: (1) to whom the data is provided, (2) what is done with the data (i.e., whether there are benefits for the one receiving the data vs. benefits for the one providing the data), and (3) what kind of data is shared (i.e., the sensitivity of the data). Some data are inherently sensitive because they touch taboo topics (e.g., information on income, sexual behavior, etc.), other data is only sensitive if it reveals specific information about illegal (e.g., illicit drug use) or counter-normative behaviors and attitudes (Tourangeau and Yan 2007). In this project, we measure utility, risk, and tradeoffs in the context of privacy and data sharing in several cross-sectional surveys. The data landscape has dramatically changed in May of 2018 when GDPR came into effect, and with it the control people have about their data, and the risks companies face when violating GDPR. Thus, we also collect longitudinal data on the awareness about the GDPR regulations in Germany, and in an experimental setting, we measure the influence of GDPR information on trust in various data collecting organizations.
Currently, most surveys ask for occupation with open-ended questions. The verbatim responses are coded afterwards into a classification with hundreds of categories and thousands of jobs, which is an error-prone, time-consuming and costly task. When textual answers have a low level of detail, exact coding may be impossible. The project investigates how to improve this process by asking response-dependent questions during the interview. Candidate job categories are predicted with a machine learning algorithm and the most relevant categories are provided to the interviewer. Using this job list, the interviewer can ask for more detailed information about the job. The proposed method is tested in a telephone survey conducted by the Institute for Employment Research (IAB). Administrative data are used to assess the relative quality resulting from traditional coding and interview coding. This project is done in cooperation with Arne Bethmann (IAB, University of Mannheim), Manfred Antoni (IAB), Markus Zielonka (LIfBi), Daniel Bela (LIfBi), and Knut Wenzig (DIW).
Panel surveys provide a valuable data source for investigating a wide range of substantive research questions and are used extensively in the social sciences and related disciplines. However, panel data quality can be challenged substantially by decreasing sample sizes due to dropouts over time. In its most critical form, panel attrition can be driven by selective nonresponse patterns, eventually leading to a loss in statistical power and to biased estimates. At this point, survey research typically focuses on developing or refining (weighting) methods with which systematic dropouts can be corrected after the data has been collected. Against this background, this project investigates the potential of moving from post- to pre-correction of panel nonresponse from a prediction perspective by predicting dropouts using information from previous waves and machine learning methods. In order to build prediction models which leverage information from multiple waves of a panel survey, the project investigates different longitudinal learning frameworks and builds on data from two German panel studies (GESIS Panel, GSOEP). In this setting, the usage of data-driven classifiers (e.g. random forests, boosting) allows to model complex non-linear and non-additive relationships of nonresponse predictors while focusing specifically on prediction accuracy. Feeding machine learning models with a rich set of data in a longitudinal framework offers a promising avenue for predicting panel nonresponse in advance, which could then be utilized in an effective targeted design to prevent dropouts before they occur.
Smartphones are multifunctional tools, which can be used for personal communication, planning, entertainment, information search, and many other things in our daily lives. Many people cannot imagine a life without their smartphone, and they carry them around with them all the time. The omnipresence of smartphones makes these devices interesting for researchers who want to collect data to measure human behavior through sensors built in on a smartphone.
Together with the Institute for Employment Research (IAB) we developed the IAB-SMART app to evaluate the opportunities and challenges when using smartphones for data collection in social research, more specifically on labor market research. The IAB-SMART app passively collects mobile data, such as geolocation of users, activities, social interactions, and online behavior, and launches in-app surveys. In addition, we are able to combine these data (given the user’s consents) with survey data from a longstanding panel survey (PASS) and administrative data from the Institute for Employment Research (IAB) containing the employment history of users.
The passive measures allow researchers to take a wider perspective on labor market related behavior such as home office productivity and job search strategies. Furthermore, the combination of sensor, survey and administrative data will help us to understand how (un)employment affects daily life. In addition to these substantial questions, this project helps us answer methodological research questions on the quality of the data collected through this method.
Smartphone use is on the rise worldwide, and researchers are exploring novel ways to leverage the capabilities of smartphones for data collection. Mobile surveys, i.e., surveys that are filled out on a smartphone web browser or through an app, are already extensively studied. Research on the use of other features of smartphones that allow researchers to automatically measure an even broader set of characteristics and behaviors of users that go far beyond the collection of mere self-reports is still in its infancy. For example, smartphone users can now be asked to take pictures of receipts to better measure expenditure, to agree to tracking of movements to create exact measures of mobility and transportation, or to automatically log app use, Internet searches, and phone calling and text messaging behavior to measure social interaction. These forms of data collection provide richer data (because it can be collected in much higher frequencies compared to self-reports) and have the potential to decrease respondent burden (because fewer survey questions need to be asked) and measurement error (because of reduction in recall errors and social desirability). However, agreeing to engage in these forms of data collection from smartphones is an additional step in the consent process, and participants might feel uncomfortable sharing specific data with researchers due to security, privacy, and confidentiality concerns. Moreover, users might have differential concerns with different types of data collection on smartphones, and thus be more willing to engage in some of these data collection tasks than in others. In addition, participants might differ in their skills of smartphone use and thus feel more or less comfortable using smartphones for research, leading to bias due to differential nonparticipation of specific subgroups. In a series of studies, we measure concerns and willingness when it comes to participation in smartphone data collection.
Collecting information about refugees is necessary to guide policy makers in creating sustainable integration concepts and to increase the scientific understanding of migration and integration processes in general. However, interviewing refugees in immigration reception centres and following them in a longitudinal study can be difficult. In this project, we assess the feasibility of data collection via smartphones among refugees in Germany. While using smartphones to collect mobile web survey data has become increasingly popular over the last couple of years, combining these data with automatic tracking of online behaviour and geolocation of the smartphone is a novel approach that requires thorough empirical testing. The project provides both methodological insight into how to utilize smartphone data collection (combining survey and tracking data) and much-needed scientifically based knowledge on the needs, aspirations, and life circumstances of refugees in Germany.
The use of non-traditional data (i.e., data collected from non-probability sample surveys, passive data, or Big Data) to supplement or replace survey data is growing. However, these data are not without weaknesses; they suffer from their own sources of error, access challenges, and confidentiality concerns. This project uses survey data collected on and posts scraped from Reddit.com to answer three research questions: 1) Can social media data be used to accurately assess social attitudes? 2) What are the sources of error in social media data? 3) What variability in the conclusions drawn from these data is introduced by the researcher’s choice in analytic methods? In addition to the research questions, this project also offers some descriptions of the data and access to it so future Reddit data users can further refine their budgets, timelines, and expectations.
For many years, surveys were the standard tool to measure attitudes and behavior for social science research. In recent years, however, researchers have shifted their focus to new sources of data, especially in the online world. For instance, researchers have analyzed the potentials of replacing or supplementing survey data with data from Twitter, smart devices (e.g., smartphones or fitness tracker) and data from other places where people leave digital traces. In this project, we explore the feasibility of using behavioral records of individuals’ online activities to study political attitudes and behavior. Specifically, we explore the potentials of online behavioral data to substitute traditional survey data by inferring attitudes and behavior from the online data. In addition, we analyze how complete such data are as users may switch off data collection during certain activities they do not want recorded. Moreover, we study how (social) media use shapes attitudes and behavior in the offline world. This project is done in collaboration with Ashley Amaya (RTI International).
The University of Mannheim will develop in cooperation with the University of Maryland an international online continuing education program in the area of data collection and data analysis. Development will take place in cooperation with several partners in Germany with expertise in this area: the Leibniz-Institut für Sozialwissenschaften (GESIS) and the Institut für Arbeitsmarkt- und Berufsforschung (IAB). GESIS has been active in the German survey arena for a long time, with research, education and consulting services. Particularly noticeable are its services as part of the German Micro Census and its coordination of the European Social Survey, as well as the administration of the PIAAC-Study for the OECD in Germany. The Institut für Arbeitsmarkt- und Berufsforschung (IAB) regularly conducts its own surveys such as the IAB-Betriebspanel or the Panel Arbeitsmarkt und Soziale Sicherung. Hosting the research data center, Forschungsdatenzentrum der Bundesagentur für Arbeit (FDZ), and within the German Record Linkage Center, the IAB is an international leader in data linkage as well as empirical questions with regard to data privacy protection. In the second phase of the project, it is planned to expand the cooperation to include the Institut für Statistik of the Ludwig-Maximilians-Universität München.
The planned continuing education project will employ internet-based learning methods that will provide continuous asynchronous access to educational material as well as enable an interactive exchange between lecturers and students. In combination with several on-site lectures and seminars, on-site activities, topical chats, online forums, and video-based material will enable participants in the program to build an international professional network.
Within this program of continuing eductation, a master's degree can be earned within one year upon full-time attendance and within a respectively longer time period upon part-time attendance. The program will start with an introductory on-site workshop in which students and lecturers will get to know each other and in which the online infrastructure will be explained. The core of the program consists of four base modules, followed by speciality lectures and seminars. Throughout the course of the program participants must attend a total of two workshops on-site at the location of any of the participating institutions. To reduce typical barriers to professional training, childcare will be offered at all on-site activities. In so doing, the program will make use of the existing infrastructure at the Universität Mannheim. At each of the collaborating institutions there will be a program manager responsible for coordinating the sites and who will serve as the direct contact person for the students. The effectiveness of the program will be evaluated by analyzing the reactions of the participants to the material provided, learning outcomes in terms of improved qualifications, use of learned material in their respective jobs, effects upon job performance, and estimates of rate of return. A core feature of the first development phase is a series of small randomized experiments in which different online learning modules will be tested empirically. Based on the results of the first phase, in the second phase the program will be expanded geographically and thematically, and the foundation laid for a sustainable infrastructure.
If you are interested in detailed information about the curriculum, admission criteria and enrollment, please check the program's website.
If you wish to attend a course or if you are interested in more detailed information about the study program, please contact:
Standardized Interviewing (SI) requires survey interviewers to read questions as worded and provide only neutral or non-directive probes in response to questions from respondents. While many major surveys in the government, non-profit, and private sectors use SI in an effort to minimize the effects of interviewers on data quality, the existing literature shows that
between-interviewer variance in key survey statistics still arises despite the assignment of random subsamples to interviewers. Because this type of between-interviewer variance affects the precision of a survey estimate just like sample size, it has direct cost implications when designing a survey. Survey methodologists suspect that despite proper training in SI, interviewers may still diverge from scripts (even though they are not trained to) because additional explanation is often requested by survey respondents.
Conversational Interviewing (CI) is known to handle clarification requests in a more effective manner: Interviewers are trained to read questions as worded, initially, and then say whatever is required to help respondents understand the questions. Despite literature demonstrating that CI produces noticeable decreases in the measurement error bias of survey estimates, survey researchers (and governmental agencies in particular) have been hesitant to employ it in practice, in part because of increased questionnaire administration time but also due to the fear of increased interviewer variance. The proposed research activity aims to compare the interviewer variance, bias, and mean squared error (MSE) arising in a variety of survey estimates from these two face-to-face interviewing techniques, and decompose the total interviewer variance introduced by each technique into measurement error variance and nonresponse error variance among interviewers. Doing so requires interpenetrated assignment of sampled cases to professional interviewers in addition to the presence of high-quality administrative records, and we performed an original data collection in Germany with these required features to meet our research aims.
On June 14, 2015, student researchers from the University of Mannheim conducted an exit poll of voters in the City of Mannheim mayoral election. A total of 1,575 voters in five randomly selected precincts were surveyed on a range of topics, including their voting behavior, social and political attitudes, and views on local services and issues. The data collection and analysis for this project are held in conjunction with the Department of Sociology Empirisches Forschungspraktikum I and II classes.
Media Coverage: FORUM, the magazine of the University of Mannheim (2/2015 issue, p.37)
Television station RNF election day news broadcast (Video - Mannheim Exit Poll at 1:05)
Clara Beitz, Christina Bott, Markus Büger, Angela Buschmann, Larissa Ernst, Julius Fastnacht, Rolf Fröschle, Tabea Gering, Max Hansen, Anika Herter, Büsra Karaca, Marina König, Klara Kuhn, Anna Merz, Sandra Mingham, Sevda Mollaoglu, Daniel Parstorfer, Marina Röhrich, Tom Sauer, Erika Schuller, Daria Schulte, Hannah Soiné, Mona Wirth
Eva Bengert, Felix Bölingen, Babette Bühler, Katharina Burgdorf, Angela Buschmann, Johanna Eisinger, Larissa Ernst, Rolf Fröschle, Max Hansen, Dorothea Harles, Anika Herter, Ina Holdik, Samir Khalil, Lisa Kirschbaum, Marina König, Lisa Kühn, Luisa Maigatte, Monika Matuschinski, Sandra Mingham, Nneka Mmeh, Sevda Mollaoglu, Lisa Natter, Julia Riffel, Sarah Schneider, Erika Schuller, Frederik Unruh, Annika Wagner, Mona Wirth, Clara Zimmer