Tuesday, July 24, 2012

Your variables



This is best explained by an example followed by an exercise:

So, your research will count the number of fruits the customer picks. The shop sells oranges and apples. 

=> your variable name will be: NoOfFruits. 
=> possible input (type): numbers, always positive. Zero if no fruits taken
=> missing (data not available): leave it blank or put a (-1)

Notice the name of the variable has no spaces or non-alphanumeric characters. Examples of non-alphanumeric characters: !@#$() etc.

Reason is you may be (very likely) importing this data to SPSS or another database/ stats program. The variable name in these programs does not accept spaces or non-alphanumeric characters. 

For missings, I prefer to put a -1. The reason is that if I leave it blank, I can't differentiate between what I forgot to collect or input into the sheet and values that are really missing.

Note that the value for missing data has to be in the same type of data (number in this case), and at the same time, a value that real records can NOT have.

---

Careful planning is important. You should figure out, to the best of your knowledge, whether or not you will be interested in knowing the number of oranges separate from the number of apples. Of course you know the effect on the variable design if that was the case.

---

Now, say your project wanted to distinguish between whether your customer picked up oranges, apples, both or none.

This gets interesting, because there are multiple possibilities:

1. Variables: OrangesYN, ApplesYN. (the variable type here is a 0 for No, 1 for Yes, -1 for missing) Will explain later why code it this way.

Example:

CustomerID | OrangesYN   | ApplesYN
1          | 0           | 1
2          | 0           | 0
3          | 1           | 1

You will eventually be able to figure out how many picked up oranges and how many picked up apples. If you want to find out how many picked up none or how many picked both, you will need to create a new variable to work that out. (not in this episode!)

With this design you can't figure out how many pieces the customer picked. That's ok if you are not interested to know!


2. Variables: Oranges, Apples. Type: numbers

Example:

CustomerID | Oranges        | Apples
1          | 0              | 5
2          | 3              | 4
3          | 0              | 0

With this design, you can work out all the information you could have obtained from the first design, but with the added advantage of more information. More information however doesn't come without added cost! You will be wasting your time collecting this more data about the number of each type of fruit. If your work doesn't need this data, then save your resources.


3. Now..

This is the interesting one.

What if all I am interested in is whether the customer picked only apples, only oranges, or none? Any of the above two designs can answer this question, right? 

I will add something to the mix, but before that: among the options for this example, is there an option that I left out? Apples, Oranges or None. Is there another possibility? This is what you have to carefully work out when you are building your variables.

Yes, there is a missing possibility: What if the customer picks both? And here is what helps you decide what options to include: Is that really a possibility and does it matter for you to know if he picks both? 

For this example we will suppose that a customer CANNOT pick both. Practical examples: Patient cannot be on two types of erythropoietin. Patient cannot be on two types of heparin.

For this scenario, one field can do the job:

CustomerID | TypeOfFruit
1          |  2
2          |  1
3          |  0

And you have to have a code list
0: none
1: Oranges
2: Apples



EXERCISE:

The fruits shop offers Oranges, Apples, Mangos and Pineapples

Design your variables (with an example as done above) for a scenario that:
1. Customer can only pick one type of any fruit or none.
2. You need to know how many pieces the customer picked.

Think carefully!



No comments:

Post a Comment