What Is Histogram Distribution?
In the Cartesian coordinate system, the horizontal axis represents the continuously available values of the sample data. The sample data is divided into m groups according to the minimum and maximum values of the data, so that the maximum and minimum values fall within the open interval (a, b), a It is slightly smaller than the minimum value of the sample data, and b is slightly larger than the maximum value of the sample data. The group distance is d = (ba) / m. The boundary range of each data group is left-closed-right-open interval, such as [a, a + d), [a + d, a + 2d), ... [a + (m- 1) d, b). The vertical axis represents the frequency divided by the group distance (the number of sample data that falls in each group is called the frequency, and the frequency is divided by the total number of samples as the frequency). The quotient of the frequency and the group distance is the highest and the group distance is the bottom. The rectangle is represented on a rectangular coordinate system, and the statistical graph drawn from it is called a frequency distribution histogram.
- The value of the sum of the frequency of each group is 1, and the frequency distribution histogram shows that the sum of the areas of all rectangles is equal to 1. The average frequency density of each group refers to the ratio of the group frequency to the group distance, and the frequency per unit distance within the group. Taking the average frequency density as the ordinate, instead of the frequency in the frequency distribution histogram, the statistical graph is called the average frequency density histogram. The sum of the areas of all rectangles in the average frequency density histogram is equal to one. That is, the area enclosed by the top edges of all rectangles in the average frequency density histogram, the two border edges of the histogram, and the horizontal axis is equal to 1. As the sample size increases and the distance between groups decreases, the average frequency density of each group is very close to the frequency density at the median of the group. At this time, the rectangular top edge of the frequency density histogram is very close to a smooth curve. The curve is Frequency density function curve. To put it simply: it is to use the histogram to reflect the frequency distribution of the sample. Such a histogram is called a frequency distribution histogram, or frequency histogram for short.
- The frequency distribution histogram can clearly show the frequency distribution of each group and it is easy to show the frequency difference between each group. It is mainly to visually and visually represent the data we have obtained, so that we can better understand the distribution of the data. Therefore, the group distance and the number of groups play a key role. Too few groups make the data very concentrated; too many groups make the data very scattered, which masks the characteristics of the distribution. When the data is within 100, it is generally appropriate to divide into 5-12 groups.
- Several data that can be estimated from the frequency distribution histogram:
- 1. Find the maximum and minimum values in all data and calculate their difference (
- In editing work, we often encounter manuscripts with frequency (or frequency) distribution histograms (hereinafter referred to as "histograms"). Because the map provided by the author is not self-explanatory and needs to be modified and supplemented during editing, I hope to guide my work practice by learning the processing methods of such maps in other journals; therefore, the author has collected histograms published in some journals. In the process of sorting out these histograms, it was found that many bar graphs appeared, and the expression of the histograms was also confusing, which was not easy for readers to read and understand. In order to make the editing and processing of histograms standardized, I learned the relevant content in GB / T 358.1-2009 "Statistical Vocabulary and Symbols Part 1: General Statistical Terms and Terms Used for Probability", and pointed out scientific and technological journals There are some problems in the frequency (or frequency) distribution histogram representation in the thesis. The solutions are given, and examples are used to give a concrete explanation.
Comparison of frequency distribution histograms and histograms
- The definition of "histogram" in GB / T3358.1-2009 is: a graphical representation of frequency distribution, consisting of some adjacent rectangles, the base width of each rectangle is equal to the group distance, and the area is proportional to the frequency of the group. The definition of "bar graph" is: a graph consisting of a set of rectangles with the same width and height proportional to the frequency, which represents the nominal characteristic frequency distribution (note: the rectangles in the bar graph do not need to be adjacent).
- According to the definition of GB / T 358.1-2009, the histogram and bar chart are compared, and the results are as follows:
- 1) The data on the horizontal axis of the histogram is continuous and is a range. The data on the horizontal axis of the bar graph are isolated and specific.
- 2) The histogram uses the area of the rectangle to represent the frequency. The larger the area of the rectangle, the greater the frequency of this group of data. Only when the base width of the rectangle is equal, that is, the group spacing is equal, can the height of the rectangle be used to indicate the frequency . The bar chart uses the height of the bar to indicate the frequency.
- 3) Each rectangle in the histogram corresponds to a range. Because there is no overlap or omission between every two adjacent ranges, there is no gap between the rectangles in the histogram. The data in the bar graph are relative. Independent, there is a gap between the bars, and they do not need to be adjacent.
Problems with the histogram representation of frequency distribution histograms
- By analyzing the frequency (or frequency) distribution histograms in the collected scientific and technical journal articles, we find that they mainly have the following problems.
- 1) Turn the histogram into a bar chart.
- 2) The coordinate axis of the horizontal axis of the histogram cannot clearly define the grouping interval, and some grouping intervals are not half-open intervals.
- 3) The names of the ordinates of the histogram (ie, headings) have various forms, such as distribution frequency /%, frequency /%, frequency, frequency, frequency /%, frequency / number, sample / number, sample number / block, and sample number (Number), percentage /%, percentage (%), content (%), quantity (%), oil and gas unit (number). The usage of frequency and frequency is confusing. For example, the use of "frequency" becomes "frequency", and the use of "frequency" becomes "frequency".
- 4) The general questions are as follows: "... homogenization temperature histogram" "... porosity-permeability frequency histogram" "... reservoir property distribution histogram" "... porosity frequency distribution" "... Inclusion temperature statistics chart "" ... Histogram of porosity and permeability statistics "" Different particle size content of Shashan windward slope "" ... Statistics of horizontal migration distance of oil and gas "" ... Histogram of organic carbon distribution "" ... Carbon Isotope Comparison "" ... Isotopic Distribution Histogram "" ... Characteristics of Pore Type ".
Standard edit processing method of frequency distribution histogram
- 1) Horizontal axis coordinates
- The horizontal axis of the histogram reflects the category of the object under investigation. From the name of the horizontal axis (that is, the heading), you can know whether the statistics are qualitative or quantitative characteristics of the object under investigation. unit.
- If the statistics are qualitative features, then the horizontal axis coordinate value line should clearly reflect the grouping of the statistical objects: the number of groups of groups (the number of groups into which the entire sample is divided is called the number of groups), Feature name.
- If the statistics are quantitative features, then the horizontal axis coordinate value line should clearly reflect the grouping of the statistical objects: the number of groups, the group distance of each group, the opening and closing of the grouping interval (the grouping interval must be half Open the interval, so as to ensure that each data can fall into and can only fall into a certain interval).
- 2) vertical axis coordinates
- The vertical axis of the histogram reflects the ratio of the frequency of the object under investigation to the group distance. Only when the group distance is the same can the height of the rectangle, that is, the value of the ordinate (ie, the standard value) be used to represent the frequency (frequency). Since most histograms in scientific and technical journal articles use the same group distance, the study only discusses the case of equal group distance.
- The names of the vertical axis coordinates are expressed by frequency (the number of data falling in different groups is called the frequency of the group) or frequency (the ratio of the frequency to the total number of samples is called the frequency of the survey object). The sum of the frequency of each group is equal to the total number of samples of this group of data. 0 <fi100 and fi = 100. Where: f is the frequency, expressed as a percentage; i is the number of groups, i = 1, 2, ..., m. The frequency reflects the proportion of the frequency of each group in the total number of data samples.
- If it is a frequency distribution histogram, the vertical axis coordinate heading uses frequency /%, and if it is a frequency distribution histogram, frequency is used.
- The vertical axis coordinate heading is "Frequency /%", then fi = 100. If it is "frequency", then the sum of the frequency of each statistical object (ni = n) must be equal to the total number of sample data n. This method is used to preliminarily determine whether the author gives the frequency or frequency distribution histogram.
- 3) Shape of figure
- Known from the definition of histogram, a histogram is a figure composed of adjacent rectangles.
- When using Excel to draw a histogram, you must first draw a column chart and set the classification interval between each column chart to 0. The specific steps are: select a data series, right-click, and pop-up floating menu Select "Data Series Format", click the "Options" tab, and set the "Classification Spacing" to "0". At the same time, select the "Separation according to data points (V)" check box and press the "OK" key , The interval between the column charts is canceled and becomes a rectangular connected histogram that meets the requirements of the standard.
- 4) Picture title
- The title should reflect the category name of the object under investigation and the style name of the figure, instead of being represented by a general title. It is recommended to add the words "frequency (or frequency) distribution histogram" that clearly indicates the type of graph, so that it can be clearly distinguished from the bar graph and easy for readers to retrieve. For example, the title of the example in Chapter 2 4) can be changed to "... histogram of frequency distribution of homogenization temperature" "... histogram of frequency distribution of porosity and permeability" "... porosity and permeability of reservoir Frequency distribution histogram "" ... porosity frequency distribution histogram "and so on.
- 5) other
- Since the frequency (or frequency) distribution histogram is a statistical graph, the total number of samples should be given in the graph. When more than one object is examined, that is, when the horizontal index reflects the characteristics of multiple objects, it must be indicated with a legend.
Conclusion of Frequency Distribution Histogram Study
- For histograms and bar charts, care should be taken to distinguish them. Edit the processing specifications according to the histogram, you can ask the author to modify the drawings according to the specifications, supplement the necessary information, and then edit and then process. The figure thus obtained is self-explanatory, which is convenient for the reader to read and understand. [2]