br Acknowledgments We thank Marshall Burke

2018-11-07

Acknowledgments We thank Marshall Burke, Dimitri Gershenson, seminar participants at Toulouse School of Economics, the Centre for the Study of African Economies at Oxford University, and two anonymous referees for useful comments. This research was made possible with the support and cooperation of the Rural Electrification Authority. This work was supported by the Development Impact Lab (USAID Cooperative AgreementsAID-OAA-A-13-00002 and AID-OAA-A-12-00011, part of the USAID Higher Education Solutions Network), the Blum Center for Developing Economies, the Berkeley Energy and Climate Institute, the UC Center for Energy and Environmental Economics, and the Center for Effective Global Action.
Introduction Efforts to assess the impact of global poverty purchase Zebularine projects, such as solar lighting installations, latrines, water pumps and filters, and cookstoves, often rely on data collected through person-to-person surveys, subjective observations, and/or expensive and time-consuming experimental studies. Data is frequently recorded by hand and processed on a per-project basis. These conventional approaches have limitations that can impact the value of the derived data. In the case of surveys and observations, research has shown surveys often overestimate adoption rates due to courtesy bias (where the participant is attempting to please the surveyor) (Manun’Ebo et al., 1997) or recall bias (tendency to forget details in more distant past) (Stanton et al., 1987). Furthermore, the presence or repeated visits of observers or enumerators can cause reactivity—influencing the behavior they are measuring (Zwane et al., 2011). And even with well-designed experimental studies such as randomized controlled trials, the data collected and subsequent impact analysis are often not available until well after the intervention is considered complete. This can delay providing input to subsequent interventions. Overarching these challenges is the bespoke nature of most data collection, analysis and sharing systems that are either (1) basic and limited or (2) expensive. Kepler (Altintas et al., 2004), Conveyor (Linke et al., 2011), Taverna (Hull et al., 2006), Mobyle (N\'eron et al., 2009), DHIS2 (Manya et al., 2012) and Open Foris (Miceli et al., 2011) are examples of existing scientific workflow and data management platforms. These platforms have several limitations, such as their supported programming languages. For example, while Kepler allows users to integrate R and MATLAB code into workflows, code written in other languages can only be integrated in the form of web services, which might already be too difficult for most users. Additionally, some of these platforms are domain specific and thus only provide a limited set of algorithms. Mobyle, for instance, focuses on algorithms for the bioinformatics domain. DHIS2 is exclusively for health workflows and while it provides built-in analysis features and a web API, any customized processing code is run outside of the system. Further, these platforms only, if at all, track the provenance of data (i.e., how data was collected or processed) from the point where it enters to the point where it leaves the platform. As these platforms do not contain data gathering functionality they neither capture the provenance of the originally collected data nor the provenance of the final outputs which leave the system (e.g., visualizations). In contrast to most of these platforms, Mezuri is conceived as a more broadly applicable, user-configurable data collection, analysis and sharing platform for global development professionals. Mezuri aims to provide end-to-end support for collecting provenance data and allows users to choose from a variety of programming languages and even combine different languages when implementing their workflows. Our proposed platform builds on existing efforts to collect data with smartphones and cellular-based sensors and digital surveys in global development settings, and combines these technologies with online data tools. Specifically, Mezuri will extend Open Data Kit (ODK) by building upon ODK 2.0\'s (Brunette et al., 2013) infrastructure to create an integrated data collection platform with provenance, processing, analysis, and sharing. Mezuri will use ODK 2.0 infrastructure and protocols to enable end-to-end integration with the ODK 2.0 tool suite. Additionally, Mezuri will leverage existing remote sensor data collection systems like Get All The Data (Pannuto et al., 2013) and SWEETSense (Thomas et al., 2013). These systems were selected because of the team’s existing expertize with these platforms. In this paper, we identify the needs of researchers based on user surveys (see Section 2) and example applications (see Section 3), derive corresponding engineering requirements (see Section 4), and propose a design that addresses these requirements (see Section 5). Finally, we will outline future technology challenges (see Section 6).