## 2018년 2월 4일 일요일

Simpson's Paradox(English: Simpson's paradox) orYule Simpson Effect(English: Yule-Simpson effect) is described by (). , The correlation in the population that divided the population may be different. In other words, although it is in the case where the group is divided into two, the opposite hypothesis may be established in the whole group even though it is established.

For statisticians this phenomenon has been known for more than a century, but scientists, people, who deal with people, have recently been discussing this paradox.

A and B received you to solve 110 questions in both 1st and 2nd times. In the first test, Mr. A solved 100 questions, 60 correct answers, and B 9 answers out of 10 questions were correct answers. In the next test, Mr. A was 1 out of 10 questions and B 10 were 100 of 100 correct answers. Which is the correct answer rate for A or B?

Consider this example. Introduce some symbols to organize the story.

• In the first test, A was right answer by 60% (S A (1) = 60%) of the problem solved and 90% (S B (1) = 90%) by B. That is, the rate of correct answers was higher for Mr. B.

• Similarly, in the next test, A was 10% (S A (2) = 10%) and B was 30% (S B (2) = 30%). For both tests, B had a higher rate of correct answers.

• However, looking at the two tests together, both A and B have solved 110 problems, of which A is 61 (S A = 61/110), B is 39 ( S B = 39/110) was the correct answer.

• In other words, S B <S A, andB resulted in a higher answer ratefor A, even though both tests had a higher rate of correctness than A .

In this paradox, the calculation method is not considered. If S B (1)> S A (1) and S B (2)> S If it is A (2), we tend to think that S B must be larger than S A. But what if we give different weights when computing each total score? The weight of A's first test is 100/110 and in B you are 10/110. The weight of the second test is A 10/110, B you 100/110.

S_A = 100/110 _S_A (1) + 10/110 _S A (2)

S B = 10/110 S B (1) + 100/110 S B (2)

By giving weights, the total score rate of A is S A = 61/110 = about 55%, the total score rate of B is S B = 39/110 = It can be calculated as about 35%. In this way, it is possible to see the paradox by the calculation method.

However, this is effective only in the assumption that A and B are "receiving the same content of 110 questions exactly", for example, the repeat rate for customer correspondence of 110 people and the collection of responses to customer satisfaction questionnaires etc. In the statistical processing of the person still remains between the performance of the individual and the overall performance.

Based on the total score, it is considered that A is higher. However, it is possible to take a story as if B is above, as in the following example.

"A and B are doctors at a hospital as doctors. The patient was treated for two groups of moderate and severe, testing the results of 110 patients each. B was a better outcome than A in both moderate and severe groups, but overall treatment outcome was bad. The reason for this is that most of the patients in B are severe (100/110) and most of the patients in A are mild (100/110). Therefore, the conclusion that A's treatment result was good is logically wrong. "

In the above story, I have not modified anything about the situation of A and B from the story of the previous test. These problems are problems discussed in recent literature as Simpson's paradox.

Acquired from ""

Post Date : 2018-02-04 08:00