Post Your Answer
4 years ago in Statistics By Ashwin Goel
Options to analyse non-normal distribution statistically
Hello experts, I am yet confused about how to proceed with data evaluation in my research framework. However, I am interested in analysing my histogram in which data is distributed in a non-normal way. I have taken a range of 0-100 and the tests I performed are above 150. In these tests, most of the data (about 114) have fallen between the range of 30-60. All the other remaining tests have become outliers now.
Â
So, I have a few questions that I need answer for:
-
How can I get an accurate (or near accurate) average from this type of data distribution?
-
Should I calculate the mean by including the outliers or not?
-
How can I examine the effect and cause relationship between these outliers and the mean calculated?
Â
I want to analyse it statistically but I am not well-versed in the various approaches so I do not know which method is appropriate in this case. It will be really helpful if any of you can help me in deciding which statistical tests to use and which of the metrics (as in mean, standard deviation, significant value, or p-value) need to be evaluated.
If you can provide references, then also it is great.
Â
Looking forward to the answers.
Â
All Answers (2 Answers In All)
By David Answered 4 years ago
Why don’t you try going with Robust statistics to get the calculations right?
According to me, I can help you solve your data systematically. You can use the robust concepts of mean and standard deviation both. It will get you the suitable answer.
Reply to David
By Mehar Mehta Answered 4 years ago
Also, you said that you have taken about 150 test cases, are these by chance a sample? If they are a sample, then I would suggest you check whether you chose the right selection method or may as well select a different sample collection which can provide you a steadier distribution of data. If not these, then go with the research hypothesis that you have selected. Based on that you can decide whether you need to get another sample or not.
However, if these test cases are the only cases you are interested in, then there is no need for statistical testing as you can do without the significance value, p-value, or such measures.
You also need to check the reason for the test cases being the outliers. Whether there is a mistake or a reason for being less in number. So, please check that once too.
Â
Now, to answer your questions one by one:
As far as I understood, median should be the appropriate approach for you to follow as most of your test cases are ranging from 30-60, making it skewed in the middle score data.
You can get more information about median and interquartile range from the following links:
https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_summarizingdata/BS704_SummarizingData5.html#headingtaglink_4
https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_summarizingdata/BS704_SummarizingData7.html#headingtaglink_1
Â
Calculating mean with or without outliers depends on your purpose. That is why I asked about your research question. Do you have to find the correlation between outliers and mean or not?
Â
Calculate the mean and test its accuracy. (This depends on the second answer again.) If you can just compare your answer with the research hypothesis, then I don't think that you need to compare it with the mean as well.
Â
I hope I have answered your questions in a way that you understood.
Replied 4 years ago
By Ashwin Goel
Reply to Mehar Mehta
Related Questions