Where does data visualization begin?
Part 1 of your data visualization project
Last updated May 2021
In the reading you read about the process of understanding in chapter one, the process of visualization design in chapter two, and the method to develop the focus of your visualization project or formulating your brief in chapter three.11 Kirk, A. (2019). Data analysis and visualisation: A handbook for data driven design (2nd ed.). Sage. This assignment is part of a semester-long project you will build on over the term.
In order to plan a project, you need an objective. Where does that come from? Similar to what Jee (2020) 22 Jee, K. (2020, April 3). Data science project from scratch – part 1 (project planning) [Video]. YouTube. stated, data analysis projects begin with a problem to solve, a question to answer, or, from a research perspective, a hypothesis to test.
For this project, you have taken on the role of a database analyst in the Office of Civil Rights for the US government, working for the Consumer Financial Protection Bureau (CFPB). Today your manager tasked you with a new project when he asked you to answer the following question
Are consumers in big cities or in higher income communities getting preferential treatment from our us?
Where do you go from here? How do you answer this question? First, youll have to identify the parameters or the elements that are known and unknown, regarding this assignment.
For example, whats a big city? Whats a higher income community? How do you define these attributes? Do you think that your management team would benefit from an analysis of data from 1950 and nothing more recent? Probably not.
The question your manager posed is a yes or no question. You will provide information to support or defend this answer to this question, not a yes or no. This allows your manager to determine if preferential treatment is present based on the evidence you provide.
In order to gather that evidence, you will look for relationships, associations, or patterns in the data to determine if the information suggests that level of service changes for consumers, as the size of cities and income the change. The data will all come from the database.
Table 1
The Data Dictionary
FieldsDescriptiondate_receivedThe date the CFPB received the complaintproductThe type of product the consumer identified in the complaintissueThe issue the consumer identified in the complaintcompanyThe complaint is about this companystateThe state the consumer resides inzip_codeThe mailing ZIP code provided by the consumersubmitted_viaHow the complaint was submitted to the CFPBdate_sent_to_companyThe date the CFPB sent the complaint to the companycompany_response_to_consumerThe response from the company about this complainttimely_responseIndicates whether the company gave a timely response or notcomplaint_idThe unique identification number for a complaintdelayThe number of days between the date received by the CFPB and the date the complaint was submitted to the companypopulationPopulation based on the zip code of the consumermedian_household_incomeThe median household income based on the zip code of the consumer
Note. This is the data dictionary based on the data adapted from CFPB (n.d.)33 Consumer Financial Protection Bureau. (n.d.) Consumer complaint database API docs [data set and code book]. Office of Civil Rights. Retrieved April 28, 2021, from and Rozzi (2021).44 Rozzi, G. (2021). Data & functions for working with US zip codes. GitHub.
Let us return to that question your manager posed. What is the scope of time that you should look at? What type of information are you looking for? For this project you will be given my translation of this practical question into a testable question or in other words a question that data analysis can address.
My translation is as follows:55 In this translation, I used Vermont. However, the state that you will use is based on the first first name listed in Blackboard. Using the first letter of that name, you can find the state you will use.
A-DOhio
E-JIllinois
K-NGeorgia
O-SArizona
T-VWashington (state, not D.C.)
W-ZNew Jersey
Using the wrong state will cause a loss of no less than 20% of the possible points.
Is there a relationship between a consumers local area population, the consumers local median household income, and the days of delay between receiving and forwarding consumer complaints from Vermont in 2020?
What makes this version different? Lets start with the part of this question that makes something that data analysis can address.
Is there a relationship between a consumers local area population, the consumers local median household income, and the days of delay between receiving and forwarding consumer complaints from Vermont in 2020?
How can a relationship be analyzed? There are several ways, actually. Visual analysis is a great way to analyze data for relationships, association, or patterns, or trends.
This version of the question also specifies exactly what content is going to be used from the database.
Vermont is within the state
field.
The year 2020 is within the date_received
field.
Is there a relationship between a consumers local area population, the consumers local median household income, and the days of delay between receiving and forwarding consumer complaints from Vermont in 2020?
The question should indicate what is analyzed and how to analyze it. Because words can be understood differently by different people, know your audience and provide sufficient clarity. For this objective, you will work with the translation I have provided.66 Did you envision the managers question differently? Email me your translation and any information I may need in order to understand your perspective. If you have a plan, it is likely that I will allow you to use your ideas instead of the one I have provided. My email address is
Create a plan
For this weeks objective, you will need to look at this research question and formulate your brief.
Formulating the brief answers these questions
- Why is this interesting or important? What about it is important?
- What requires clarification?
- What pitfalls could cause the analysis to be incomplete or incorrect?
- Who is the audience? What do you think the audience expects?
- How much time do you have to complete the project?
- What are the project conditions?
- What tools do you have access to?
- Or, as is the case in this course, what are you limited to, regarding software?
- Can the evidence be summarized in one visualization? Two? Several?
- Will the results of this analysis be an exhibit (evidence), an explanation (presentation), or an exploration (audience interaction)?
This assignment does not require you to access the data yet. I will provide an RData file with the data in the next assignment.
However, it may be helpful to understand what type of data is used to represent this information. Here are the first six rows of the data.
## date_received
## 1 2019-09-19
## 2 2019-05-23
## 3 2021-04-02
## 4 2019-11-20
## 5 2021-03-23
## 6 2019-10-24
## product
## 1 Credit reporting, credit repair services, or other personal consumer reports
## 2 Checking or savings account
## 3 Credit reporting, credit repair services, or other personal consumer reports
## 4 Credit card or prepaid card
## 5 Money transfer, virtual currency, or money service
## 6 Credit reporting, credit repair services, or other personal consumer reports
## issue
## 1 Incorrect information on your report
## 2 Managing an account
## 3 Incorrect information on your report
## 4 Closing your account
## 5 Fraud or scam
## 6 Problem with a credit reporting company's investigation into an existing problem
## company state zip_code submitted_via
## 1 Experian Information Solutions Inc. PA 15206 Web
## 2 MIDFIRST BANK AZ 85254 Referral
## 3 EQUIFAX, INC. PA 19403 Web
## 4 PENTAGON FEDERAL CREDIT UNION VA 22304 Referral
## 5 Square Inc. CA 91387 Web
## 6 TRANSUNION INTERMEDIATE HOLDINGS, INC. NY 10035 Web
## date_sent_to_company company_response_to_consumer timely_response
## 1 2019-09-20 Closed with non-monetary relief Yes
## 2 2019-05-28 Closed with explanation Yes
## 3 2021-04-02 Closed with explanation Yes
## 4 2019-11-21 Closed with explanation Yes
## 5 2021-03-23 Closed with explanation Yes
## 6 2019-10-24 Closed with explanation Yes
## complaint_id delay population median_household_income
## 1 3379500 1 28615 33792
## 2 3255455 5 45801 90718
## 3 4267123 0 44260 76178
## 4 3446074 1 44300 76061
## 5 4239229 0 40328 83484
## 6 3415907 0 33969 24533
Useful information
Use your current or a previous working environment, a previous course, or another life experience to identify other requirements for this project.
What do I mean when I say to your life experiences? From the video lecture for Chapter 3, two different graphs were presented that answered the question. One was black and white, while the other was not. When using your experiences, which option is better suited?
Does that mean you get to invent some of the requirements, circumstances, or constraints? YES! However, it also means that you will need to read the feedback I provide. As the course continues, you will learn more about what is effective or ineffective in data visualizations. The audience in your paper is not the professor. Choose an audience based on your experiences. However, make sure to explain what that audience is, so that I can understand your perspective.
Submission requirements
When you document this information, you will need to write it as a paper. This is not a blog, a discussion, or a short answer paper. You will need to include an introduction, a topic sentence, supporting paragraphs, and a conclusion. Not great at writing? Go to Start Here in the main menu and look at the resources available to you. Make sure that you document your work using the standards of APA 7. To help with formatting use the APA 7 Student Paper template.
This document does not have a minimum or maximum length. No external sources are required for this assignment, except for the source of the data and data dictionary. Do not forget to credit these sources. Typically this submission is about two pages, not including the cover page or reference section.
There is no reason to regurgitate the words or ideas of others extensively. I am not interested in your ability to research and repeat the words of others. You can use others ideas and credit their work. However, I expect to read your ideas, your experiences, and your words when I read your submitted work.