# # Google Analytics - 3 Prepare - Module 2 - Data responsibility ## 0. Overview >[! quote] Before you work with data, you must confirm that it is unbiased and credible. After all, if you start your analysis with unreliable data, you won’t be able to trust your results. In this part of the course, you will learn to identify bias in data and to ensure your data is credible. You’ll also explore open data and the importance of data ethics and data privacy. ### Learning Objectives >[! info] Learning objectives for Course 3, module 2 >- [ ] Explain what is involved in reviewing data to identify [[Bias]] >- [ ] Discuss the difference between biased and unbiased data >- [ ] Identify defferent types of bias including [[Confirmation bias]], [[Interpretation bias]], and [[Observer bias]] >- [ ] Discuss characteristics of credible sources of data including reference to untidy data. >- [ ] Explain the concept of [[Open data]] with reference to the ongoing debate in data analytics >- [ ] Define [[Data Ethics]] and [[Data privacy]] >- [ ] Explain the relationship between [[Data Ethics]] and [[Data privacy]] >- [ ] Demonstrate an understanding of the benefits of [[Data anonymization]] >- [ ] Demonstrate an awareness of the accessibility issues associated with [[Open data|open data]]. ### Glossary for Module 2 - [[Bad data source]] | [[Bias]] - [[Confirmation bias]] | [[Consent]] | [[Cookie]] | [[Currency]] - [[Data anonymization]] | [[Data bias]] | [[Data Ethics]] | [[Data interoperability]] | [[Data privacy]] - [[Ethics]] | [[Observer bias|Experimenter bias]] - [[Fairness]] | [[First-party data]] - [[General Data Protection Regulation of the European Union (GDPR)]] | [[Good data source]] - [[Interpretation bias]] - [[Observer bias]] | [[Open data]] | [[Openness]] - [[Sampling bias]] - [[Transaction transparency]] | [[Unbiased sampling]] <hr> ## 1. Unbiased and objective data ### 1.1 Video: _Introduction to bias, credibility, privacy, and ethics_ (1:00) - Answer the questions: - Who owns all the data. - First will look t analyze data for bias and credibility. - Importance of being good and bad -- Good data vs Bad data etc. - Data ethics, privacy and access. ### 1.2 Video: _Bias: From questions to conclusions_ (3:00) Run into bias in everyday life. >[! cue] Def [[Bias]] >[! info] Bias >A preference in favor of or agains a person, group of people or thing. Can be concious or unconcious bias >[! cue] Def [[Data bias]] >[! info] Data bias >A type of error that systematically skews results in a certain direction. Bias can happen if the sample group has a lack of inclusivity. Bias can happen if the respondents are rushed. Bias and Fairness must be considered at each stage from the beginning. ### 1.3 Video: _Biased and unbiased data_ (2:00) Biases we have as people can show up in our data making the data biased. >[! cue] Def [[Sampling bias]] >[! info] Sampling Bias >When a [[Sample|sample]] isn't representative of the [[Population|population]] as a whole - can be avoided by using random sampling. if not you end up favoring one outcome. >[! cue] Def [[Unbiased sampling]] >[! info] Unbiased sampling >When a [[Sample|sample]] is representative of the [[Population|population]] being measured. Use visualizations to uncover bias. ### 1.4 Video: _Understand Bias in Data_ (3:00) Lots of different types of data bias. Need all sides of the story to avoid bias >[! cue] Def [[Observer bias]] >[! info] Observer bias >The tendency for different people to observe things differently. AKA Experimenter Bias, Research Bias >[! cue] Def [[Interpretation bias]] >[! info] Interpretation bias >The tendency to always interpret ambiguous situations in a positive or negative way >[! cue] Def [[Confirmation bias]] >The tendency to search for or interpret information in a way that confirms pre-existing beliefs All the types of bias' that we have reviewed, will impact our analysis negatively. ### 1.5 Quiz: _Test your knowledge on unbiased and objective data_ 4 questions, 75% min passing score #### Results: 100% --- ## 2. Achieve data credibility ### 2.1 Video: _Identify good data sources_ (2:00) >[! cue] Def [[Good data source]] >[! info] Good data source >A data source that ROCCCs!! >Data that is reliable, original, comprehensive, current, and cited (ROCCC) Some best practices to follow to identify good data. - How to identify - ROCCC - Reliable; vetted data, proven fit for use. - Original; validate the data with the original source. (when using [[Second-party data|second-party data]] or [[Third-party data|third-party data]]) - Comprehensive; contains all the critical information to answer the question or find the solution. - Current; avoid stale data. Usefulness of data decreases as time passes. - Cited; information that has citations ### 2.2 Video: _What is "bad" data?_ (2:00) >[! cue] Def [[Bad data source]] Bad data fails to meet any of the ROCCC good data rubric. essentially these are the negations of each of the ROCCC ### 2.3 Quiz: _Test your knowledge on data credibility_ 4 questions, untimed. 75% minimum score #### Results: 93.75% A partial deduction on a "select all that may apply" question-type. The question was testing the _Cited_ indicator of a good data source. I selected 2 out of 3 correctly and failed to identify the final one as being part of the solution. They way the option was worded threw me off making me think that the option was more about being _current_ than about citations. --- ## 3. Data ethics and privacy ### 3.1 Video: _Essential data ethics_ (4:00) - we all have our own biases. >[! cue] Def [[Ethics]] >[! info] Ethics >Well-founded standards of right and wrong that prescribe what humans ought to do. >[! cue] Def [[Data Ethics]] Data has standards too! >[! info] Data ethics >Well-founded standards of right and wrong that dictate how data is collected, shared, and used. [[Data privacy]] applies to general [[Data Ethics]] Example of this is [[General Data Protection Regulation of the European Union (GDPR)|GDPR]] Aspects of data ethics: There are many, but in the course cover 6 of them. 1. [[Ownership]]; Describes that a person's data belongs to the person. 2. [[Transaction transparency]]; all processing algorithm should be explained clearly which should be understood by the data owner. Let's people judge if the outcome is fair and unbiased 3. [[Consent]]; data owners should know why data is being collected, how it is used, and how it is stored. 4. [[Currency]]; individuals should be aware if their data is being used as a funding or revenue source. IE, your data is the currency of the transaction. 5. [[Data privacy|Privacy]] the last two will be covered later. 6. [[Openness]] ### 3.2 Video: _Alex and the importance of data ethics_ (3:00) Replay of Alex and their team working on AI ethics. [[Google Analytics - 1 Foundations - Module 4#Video _Alex Fair and ethical data decisions_ (3 00)]] > Data are people. and the idea of beneficence ### 3.3 Video: _Prioritize data privacy_ (1:00) [[Data privacy|Privacy]] is personal We are all entitled to it. Covers the following #### For individuals - Protection from unauthorized access to our private data - Freedom from inappropriate use of our data - The right to inspect, update, or correct our data - Ability to give consent to use our data - Legal right to access the data. #### For companies - companies should put data privacy measures in place to protect a user's data so none of their rights are violated. ### 3.4 Reading: _Data anonymization_ Discusses what [[Data anonymization]] is and what data should be anonymized. - Def Personally Identifiable Information (PII) - information that can be used by itself or with other data to track down a person's identity. - data de-identification - A process used to wipe data clean of all PII #### Anonymized Data elements - Telephone numbers - Names - License plates and license numbers - Social security numbers - IP addresses - Medical records - Email addresses - Photographs - Account numbers ### 3.5 Video: _Andrew: The Ethical Use of Data_ (2:00) Sr. Developer advocate at Google. We don't want to amplify unfair outcomes. Data ethics is an active process. Not a passive one. ### 3.6 Quiz _Test your knowledge on data ethics and privacy_ 4 questions. Untimed. 75% minimum score needed #### Results: 100% --- ## 4. Understand open data ### 4.1 Video: _Features of open data_ (4:00) >[! cue] Def [[Openness]] - Must be available and accessible - downloads should be in simple file formats - Reuse and redistribution - open data redistributed and used with other datasets - universal participation - everyone, use , reuse , no discrimination credible data can be used more widely if it is open. >[! cue] Def [[Data interoperability]] >[! info] Data interoperability >The ability of data systems and services to openly connect and share data Though, having open data and interoperability requires a lot of agreement and cooperation. ### 4.2 Reading: _The open data debate_ With open data, come privacy concerns. If data is very open, third-parties may access and make money/etc. ### 4.3 Video: _Andrew: Steps for ethical data use_ (3:00) Key activity in ethics as an analyst is to self-reflect - what is it that i am doing and what impact does this have? - think of those represented in the data set Analysts stand at the intersection between the organization and the people who can benefit or can be harmed by the data being used. ### 4.4 Reading: _Resources for open data_ Also on the [[Open data]] card Resources of open data. - [US government data](https://www.data.gov/) - [Census Bureau](https://www.census.gov/data.html) - [Open Data Network](https://www.opendatanetwork.com/) - [Google Cloud Public Datasets](https://cloud.google.com/datasets) - [Dataset Search](https://datasetsearch.research.google.com/) ### 4.5 Hands-On: _Kaggle datasets_ This one takes a while, but has good information on kaggle and its usefulness. ### 4.6 Quiz: _Test your knowledge on open data_ 4 questions untimed. 75% minimum to continue #### Results 100% ## 5. Module 2 Challenge ### 5.1 Reading: _Module 2 Glossary_ - [[Bad data source]] | [[Bias]] - [[Confirmation bias]] | [[Consent]] | [[Cookie]] | [[Currency]] - [[Data anonymization]] | [[Data bias]] | [[Data Ethics]] | [[Data interoperability]] | [[Data privacy]] - [[Ethics]] | [[Observer bias|Experimenter bias]] - [[Fairness]] | [[First-party data]] - [[General Data Protection Regulation of the European Union (GDPR)]] | [[Good data source]] - [[Interpretation bias]] - [[Observer bias]] | [[Open data]] | [[Openness]] - [[Sampling bias]] - [[Transaction transparency]] | [[Unbiased sampling]] ### 5.2 Quiz: _Module 2 challenge_ Timed; 40min 10 points open note #### Results: 100% ## Summary and takeaways >[! summary-top] Module 2 Key Take Aways >- [[Data Ethics]] is an active process, not a passive one. Never rely on assumptions. >- Bias can occur consciously or subconsciously and can take on many forms of bias. >- There is a trade-off from [[Openness]] of data and [[Data privacy|data privacy]].