HERE is the output of simplified (and much smaller) Excel file after chopping out all of the variables NOT needed to explore, as stated above, what differences exist in demographic distributions of various self-reported ailments. To keep our first visualization simple, I decided to focus on a dozen “ever told” questions - ever told you had arthritis (MCQ160A), a heart attack (MCQ160E), a thyroid problem (MCQ160M), etc.Īs can be seen on the “Code List” tab, all 12 of these selected variables have the same encoding: “1” means Yes, “2” means No, “7” means the respondent refused to answer the question, and “9” indicates that the respondent reported not knowing the answer to that particular question. Similarly, race/ethnicity delineations are encoded as numbers in RIDRETH3, educational attainment (for adults 20+) as various numbers in DMDEDUC2, and so on through the rest of the variables as laid out on the “Code List” tab. For example, for the variable RIAGENDR, “1” means Male and “2” means Female. government statisticians encoded answers as numbers. The “WTINTPRP” variable provides full sample interview weights that we should use in our projections of P_MCQ data on account of our MCQ data being derived solely from interviews.Īs for the rest of the P_DEMO variables, U.S. I kept all of the DEMO variables except RIDSTATR (were patients also MEC examined) and WTMECPRP (Full sample MEC exam weights). In this case, my approach was as follows. Generally speaking, it’s a highly workable practice to aggregate and separate the data of interest from those beyond one’s current scope of consideration. However, at the least, knowing of these diverse approaches serves to expand one’s toolbox in light of there ever being exceptional cases. I’ll be going through several approaches on this website, including those I likely wouldn’t choose were I to work at maximal efficiency. There are many ways to handle encoded variables, and discerning best option(s) hinges upon what it is you’re trying to get done. For example, with the variable MCQ010 concerning whether respondents were ever told that they have asthma, the code “1” means “Yes” and “2” means “No”. This use of numbers is a common practice because it simplify coding while minimizing file sizes. In raw NHANES data, as with many other public data sets we will encounter in our explorations, each variable is encoded such that, without the value key, one can’t really interpret anything. I then chose my variables of interest and removed what was unnecessary to this first simple visualization. residents to whom statistical generalizations associated with that SEQN may be projected.įirst, I opened the monstrously large output from Step 2 and copy-pasted data from the two data tables of interest, P_DEMO & P_MCQ. population, each associated “weight” providing the number of U.S. This is because each SEQN represents a certain segment of the U.S. ![]() To see what these NHANES data look like, I’ll need to connect P_MCQ (Medical Conditions) and P_DEMO (Respondent Demographics). I wonder what differences exist in demographic distributions of various self-reported ailments. In my case, as I went through previous gathering and cleaning steps, the NHANES data on self-reported medical conditions (P_MCQ) jumped out to me as interesting, likely because I’m an epidemiologist. Consider starting with some public data that you find interesting, especially if it’s in the form of lighthearted wonder, for wonder is like a lens through which we find the gold beneath our feet. These first instructions will provide an initial approach, though each subsequent visualization in intentionally use alternative approaches to provide a broader assortment of approaches, each with its pros and cons.ĮXPLORER’S NOTE: After Temet Nosce, perhaps this above all else: let curiosity be your guide. The first step in Tableau is to connect to the data you wish to explore. Using Excel to Prepare Data for Basic Tableau Visualizations I created an account on Tableau Public and downloaded the latest associated desktop version (2022.1.1 at the time of this writing). In the interest of keeping things free, let’s start with Tableau Public - it has less functionality than the full Professional version, but… we work with what we have available to us. We’ll be creating some APIs and visualizations using D3.js later, but the easiest way to explore data is to use a visualization platform like Tableau. Let’s start with some visualizations of the data we gathered and somewhat prepped in Steps 1 and 2. ![]() Having completed the first two steps in this NHANES journey, first downloading and then preparing the data for our initial explorations, it is time to create a plan for handling the encoded variables within our aggregated data tables.Īs a rule, it is far easier to visualize data than to statistically analyze it.
0 Comments
Leave a Reply. |