What is statistical data mining?
mining statistics, also known as knowledge or discovery, is a computer method of collecting and analyzing information. The data mining tool takes data and categorizes information to discover formulas or correlations that can be used in important applications such as medicine, computer programming, business promotion and robotic design. Statistical data mining techniques use comprehensive mathematics and complicated statistical processes to create an analysis.
Data mining includes five major steps. The first data mining application collects statistical data and provides information to the warehouse type program. Furthermore, data in the warehouse is organized and creates a management system. The next step creates a way of access to managed data. The fourth step then develops data analysis software, also known as data mining regression, while the final step makes it easier to use or interpret statistical data in a practical way.
Generally, diving of dates integrate analytical and transaction daDown systems. The analytical software is sorted through both types of data systems using open user questions. Open questions allow countless answers, so programmers do not affect sorting results. Programmers create lists of questions that help in categorizing information by overall focus.
Sorting is then based on the development of classes and clusters of data found in the data and attempts to define formulas and trends based on associations. For example, Google collects information about shopping habits to help place online advertising. Open questions used to sort these data about the buyer focus on the preferences of purchasing or habits of Internet users.
Computer scientists and programmers focus on the analysis of the statistical data that are collected. Creation of decision -making trees, artificial neuron network, the closest misery methodDA, induction of rules, data visualization and genetic algorithms use statistically running data. These classification systems help to interpret the associations discovered by analytical data programs. Statistical data mining includes small projects that can be carried out on a small scale on your home computer, but most sets of association data are so large and the data regression is so complicated that they require a supercomputer or high -speed computers.
Statistical data mining collects three general types of data, including operational data, non -operative data and meta data. In the clothing store, the operational data is the basic data used to operate business such as accounting, sales and inventory control. Non -operative data indirectly related to trade include estimates of future sales and general information on the national clothing market. Meta data concerns the data itself. Using Meta Data can sort customers into classification based on gender or geographical placement of buyers clothing orabout the favorite colors of customers if this data has been collected.
Data mining application can be very sophisticated and the tool for mining data can have extended practical applications. One example is the study of the outbreaks of the disease. The 2000 data mining project has analyzed the outbreak of Cryptosporidium in Canada Ontario to determine the causes of the increase in disease cases. The results of the mining of data were helped by the connection of the outbreak of bacteria with the conditions of local water and the lack of the correct treatment of urban water. The field called "Biosurveillance" uses mining of epidemiological data to identify the outbreaks of a single disease.
Computer programmers and designers also use the studies of probability and analysis statistics for the development of machines and computer programs. The Google Internet search engine was designed using extric data mining. Google continues to collect and uses data mining to create updates and program applications.