Big Data_Data Mining Technology Classification and Application

Introduction to big data

Big data refers to a collection of data that cannot be captured, managed, and processed by conventional software tools within a certain time frame. It is a massive amount of decision-making, insight, and process optimization capabilities that require new processing models. High growth rates and diverse information assets. Research firm Gartner gives this definition. “Big Data” is an information asset that requires a new processing model to have greater decision making, insight and process optimization capabilities to accommodate massive, high growth rates and diversification.

The smallest basic unit is bit, which gives all units in order: bit, Byte, KB, MB, GB, TB, PB, EB, ZB, YB, BB, NB, DB.

Big Data_Data Mining Technology Classification and Application

Big data feature

Volume: The size of the data determines the value and potential information of the data being considered;

Variety: the diversity of data types;

Velocity: refers to the speed at which data is obtained;

Variability: The process of handling and effectively managing data.

Veracity: the quality of the data

Complexity: huge amount of data, multiple sources from sources

Value: Rational use of big data to create high value at low cost

Data Mining Technology Classification and Application Data Mining Technology Overview

The development of Internet-based global information systems has given us unprecedented data. A lot of information brings a lot of problems while bringing convenience to people: the first is excessive information and difficult to digest; the second is that information is difficult to identify; the third is that information security is difficult to guarantee; the fourth is information form. Inconsistent, it is difficult to deal with them uniformly. Data richness and lack of knowledge have become a typical problem. The purpose of Data Mining is to effectively extract the required answers from the massive data and realize the transformation process of “data->information->knowledge->value”.

(Data mining) is the process of extracting potentially valuable knowledge (models or rules) from vast amounts of data in a non-trivial way. The term has other synonyms: knowledge discovery, information extraction, information discovery, intelligent data analysis, exploratory data analysis (data harvesting, data archaeology, etc.) in the database.

Data Mining is the most active branch of database research, development and application. It is a cross-disciplinary field involving database technology, artificial intelligence, machine learning, neural networks, mathematics, statistics, pattern recognition, and knowledge base. System, knowledge acquisition, information extraction, high performance computing, parallel computing, data visualization and many other aspects.

Data mining technology is application-oriented from the beginning. It is not only a simple retrieval query for a specific database, but also micro, medium and even macro statistics, analysis, synthesis and reasoning to guide the actual problem. In an attempt to discover the interrelationships between events and even use existing data to predict future activities. For example, the Canadian BC Telephone Company requested the KDD Research Group of Simon Fraser University of Canada to summarize, analyze and propose new telephone charges and management methods based on its customer data for more than ten years, and formulate preferential policies that are beneficial to both the company and the customers. In this way, people's application of data is improved from low-level end-query operations to decision-making support for decision-makers at all levels. This demand drive is more powerful than database queries. At the same time, the data mining mentioned here is not to discover the truth that is universally applicable, nor to discover new natural science theorems and pure mathematical formulas, nor to prove the machine theorem. All discovered knowledge is relative, has specific premises and constraints, is domain-specific, and can be easily understood by users. It is best to express findings in natural language. Therefore, the research results of data mining are very practical.

Big Data_Data Mining Technology Classification and Application

Data mining technology application

Data mining technology can serve tasks such as decision making, process control, information management, and query processing. An interesting application example is the story of "diapers and beer." In order to analyze which product customers are most likely to buy together, a company called WalMart used automated data mining tools to analyze large amounts of data in the database and found that the most purchased item with diapers was beer. Why are two items that are not compatible with each other purchased by people? It turned out that the wives often yelled at their husbands, bought diapers for their children after work, and the husbands brought back two bottles of beer after buying the diapers. Since diapers have the most opportunities to buy with beer, the store puts them together, and as a result, the sales of diapers and beer both grow. Here, digital mining technology has contributed. In general, data mining applications include telecommunications: churn; banks: clustering (segmentation), cross-selling; department stores/supermarkets: shopping basket analysis (association rules); insurance: segmentation, cross-selling, churn (cause analysis) Credit card: fraud detection, segmentation; e-commerce: website log analysis; tax department: tax evasion behavior detection; police agency: criminal behavior analysis; medicine: health care. details as follows:

E-government data mining

Establishing an electronic government and promoting the development of e-government is an inevitable trend in the application of electronic information technology to government management. Practical experience shows that government departments are increasingly relying on scientific analysis of data. The development of e-government, the establishment of a decision support system, the use of a large amount of data stored in the e-government comprehensive database, through the establishment of a correct decision-making system and decision support model, can provide a scientific basis for decision-making at all levels of government, thereby improving the formulation of various policies. Scientific and rational, in order to improve the efficiency of government office and promote economic development. To this end, in the government's decision support, it needs constant

Incorporate new information processing technologies, and data mining is the core technology to achieve government decision support. The government decision support system based on data mining will play an important role.

E-government is at the top of the five fields of information highways (e-government, e-commerce, distance education, telemedicine, and electronic entertainment) actively promoted by countries around the world, indicating that government informationization is the foundation of social informationization. E-government includes the government's information services, e-commerce, e-government, government restructuring, and the participation of the public in five aspects of government. The introduction of network data mining technology into e-government can greatly improve the level of government information and promote the informationization of the whole society. Specifically embodied in the following aspects:

1) The government's electronic trade hides the mode information in the data recorded by the server and the browser, and uses the network usage mining technology to automatically discover the access mode of the system and the user's behavior mode for predictive analysis. For example, by evaluating the time spent by a user browsing a certain information resource, it can be determined which resource the user is interested in; the domain name data collected by the log file is classified and analyzed according to the country or type; and cluster analysis is applied. Identify user access motives and access trends, etc. This technology has been effectively used in government e-commerce.

2) Website design Through the mining of website content, mainly the mining of text content, can effectively organize website information, such as the use of automatic classification technology to achieve hierarchical organization of website information; at the same time, it can be combined with user access to log information. Digging and grasping the interests of users will help to carry out website information push services and customized services for personal information to attract more users.

3) Search engine network data mining is a key to the development of network information retrieval. For example, by mining web content, clustering and classification of web pages can be realized, and classified browsing and retrieval of network information can be realized. At the same time, by analyzing the question-based historical records used by users, it is possible to effectively expand and raise questions. The user's retrieval effect; in addition, the use of network content mining technology to improve the keyword weighting algorithm, improve the indexing accuracy of network information, thereby improving the retrieval effect.

4) Decision support provides decision support for the introduction of major government policies. For example, through the exploration of various economic resources of the network, determine the trend of the future economy, and then formulate corresponding macroeconomic regulation and control policies.

Marketing data mining

Data mining technology has been widely used in enterprise marketing. It is based on the market segmentation principle of marketing. The basic assumption is that "consumer's past behavior is the best illustration of its future consumption propensity."

By collecting, processing and processing a large amount of information related to consumer consumption behavior, determining the interest, consumption habits, consumption propensity and consumer demand of a particular consumer group or individual, and inferring the next consumer behavior of the corresponding consumer group or individual, and then using this Based on the targeted marketing of specific content to the identified consumer groups, this greatly saves marketing costs and improves marketing effectiveness compared to traditional large-scale marketing methods that do not distinguish consumer object characteristics. Come more profit.

Commercial consumption information comes from a variety of sources in the market. For example, whenever we use a credit card, the business enterprise can collect commercial consumption information during the credit card settlement process, and record the time, location, goods or services of interest, prices and willingness to receive, and the ability to pay. When we apply for a credit card, apply for a driver's license, fill out a product warranty, etc., we need to fill in the form. Our personal information is stored in the corresponding business database; in addition to collecting relevant business information, the company can even Buy this information from other companies or organizations for your own use.

These data from various channels are combined and processed using supercomputers, parallel processing, neural networks, modeling algorithms, and other information processing techniques to derive merchants' decision-making for targeted marketing to specific consumer groups or individuals. information. How is this data information applied? To give a simple example, when the bank finds out that a bank account holder suddenly requests to apply for a two-person joint account by mining the business data, and confirms that the consumer is applying for a joint account for the first time, the bank will infer that the user may To get married, it will market to the user a long-term investment business such as buying a home, paying for a child's tuition, and the bank may even sell the information to a company that specializes in wedding goods and services. Data mining builds competitive advantage.

In countries and regions with relatively developed market economies, many companies have begun to deep-process business information through data mining based on the original information system to build their own competitive advantages and expand their turnover. American Express has a database for recording credit card business with 5.4 billion characters and is still being updated as the business progresses. Through the mining of these data, the company has developed a "RelaTIon ship Billing" promotion strategy, that is, if a customer buys a set of fashions in a store with an American Express card, then buy another pair of shoes in the same store. You can get a larger discount, which can increase the sales volume of the store, and increase the usage rate of the Express card in the store. For example, if a cardholder who lives in London has recently traveled to Paris on a British Airways flight, he may get a ticket discount card for a weekend trip to New York.

Marketing based on data mining can often send consumers promotional materials related to their previous consumption behavior. Kraft Foods has established a database of 30 million customer profiles. The database was created by collecting customers and sales records that respond positively to other promotional offers such as coupons issued by the company. Explore the interests and tastes of specific customers and use them as a basis to send coupons for specific products and recommend them to Kraft product recipes that match the customer's taste and health. The Reader's Digest publishing company in the United States runs a 40-year business database that contains data from more than 100 million subscribers around the world. The database runs continuously 24 hours a day, ensuring that data is continuously updated in real time. It is based on the advantages of data mining on the customer data database, which enables the readers of the abstract publishing company to expand from the popular magazine to the publication and distribution of professional magazines, books and audio-visual products, greatly expanding its business.

Marketing based on data mining is also very instructive for China's current market competition. We can often see that some manufacturers on the bustling commercial street distribute a large number of merchandise advertisements to and from the pedestrians. The result is that unneeded people are easy to follow. Discarding data, and people who need it may not be able to get it. If a company that engages in home appliance repair services mails a repair service advertisement to a consumer who has just purchased a home appliance in a store, the manufacturer who sells the special effect drug will mail the advertisement to the patient in the hospital-specific clinic, which will certainly be much better than the aimless marketing effect.

Data mining in the retail industry

Through barcodes, coding systems, sales management systems, customer data management, and other business data, information about product sales, customer information, inventory units, and store information can be collected. Data is collected from various application systems, classified by conditions, placed in a data warehouse, allowing senior managers, analysts, purchasing personnel, marketers, and advertisers to access and analyze these data using DM tools to provide them with high efficiency. Scientific decision making tools. For example, a shopping basket analysis of goods, analysis of those goods customers are most likely to buy together. For example, Wal-Mart's "Beer and Diaper", a classic that has been rumored by the industry and the business world, is a model for data mining to find out the laws between people and things. In the field of retail applications, the use of DW, DM will have excellent performance in many aspects:

1. Understand the overall situation of sales: by categorizing information - understanding the daily operations and financial situation by product type, sales quantity, store location, price and date, etc., every point of sales growth, inventory changes, and sales increased through promotions I can know everything. When retail stores sell goods, it is important to check whether the structure of the goods is reasonable at any time, such as whether the proportion of each type of goods is roughly the same. When adjusting the structure of the commodity, it is necessary to consider factors such as changes in demand caused by seasonal changes and adjustment of commodity structure of competitors.

2. Product grouping layout: Analyze the customer's buying habits, consider the route the buyer walks through in the store, the time and place of purchase, and the probability of buying different products together; through the activity analysis and correlation analysis of the product sales variety, use the main The component analysis method establishes the optimal structure of the product setting and the optimal layout of the product.

3. Reduce inventory costs: Through the data mining system, the sales data and inventory data are gathered together, through data analysis, to determine the increase or decrease of each commodity goods to ensure the correct inventory. The data warehouse system can also send inventory information and commodity sales forecast information directly to suppliers through electronic data interchange (EDI), thus eliminating commercial intermediaries, and suppliers are responsible for replenishing stocks on a regular basis, and retailers can reduce their own burden.

4. Market and Trend Analysis: Use data mining tools and statistical models to carefully study data in data warehouses to analyze customer buying habits, advertising success rates, and other strategic information. Using the data warehouse to retrieve the recent sales data in the database for analysis and data mining, seasonal and monthly sales can be predicted, and the trends of commodity varieties and stocks can be analyzed. It is also possible to determine the price cuts and make decisions on quantity and operation.

Effective product promotion: The effectiveness of sales and advertising can be determined by analyzing the market share of a manufacturer's products in various chain stores, customer statistics and historical conditions. Through the analysis of customer purchase preferences, the target customers of product promotion are determined, so as to design various merchandise promotion schemes, and through the results of merchandise purchase correlation analysis, cross-selling and up-selling methods are used to tap the purchasing power of customers. Accurate product promotions.

Big Data_Data Mining Technology Classification and Application

Banking data mining

Financial affairs need to collect and process a large amount of data. Because of its position in the financial field, the nature of its work, its business characteristics and its fierce market competition, it has more urgent requirements for informationization and electronicization than other fields. Using data mining technology can help the bank's product development department describe the customer's past demand trends and predict the future. American commercial banks are a model for commercial banks in developed countries, and many places are worth learning and learning from.

Data mining technology is widely used in the banking and financial field of the United States. Financial transactions need to collect and process large amounts of data, analyze the data, discover its data patterns and characteristics, and then discover the financial and commercial interests of a customer, consumer group or organization, and observe trends in financial markets. The profits and risks of commercial banking are coexisting. In order to ensure maximum profit and minimum risk, the account must be scientifically analyzed and classified, and a credit evaluation must be conducted. Mellon Bank uses data mining software to increase the accuracy of selling and pricing financial products, such as home-based loans. There are two main types of retail credit customers, one that rarely uses credit limits (low cyclers) and the other that maintains a high outstanding balance (high cycle). Each category represents a sales challenge. Low cyclers represent a lower risk of default and expense write-off fees, but they result in very little net income or negative income because their service costs are almost the same as those of high cyclers. Banks often provide projects for them, encouraging them to use credit limits more or find opportunities to cross-sell high-margin products. High circulators consist of high and medium risk components. High-risk segments have the potential to pay for default and write-off fees. For medium-risk segments, the focus of the sales program is to retain profitable customers and seek new customers who can bring the same profit. But according to new perspectives, user behavior will change over time. Analysis of the customer's entire life cycle costs and income can be seen who is the most profitable potential.

Mellon Bank believes that “customizing according to a certain part of the market” can identify end users and position the market for these users. However, to do so you must understand the characteristics of the end user. Data mining tools provide Mellon Bank with access to such information. Mellon Bank Sales used the Intelligence Agent to find information on advanced data mining projects. The main purpose was to determine the tendency of existing Mellon users to purchase specific add-on products: the household's general credit limit, which could be used to generate models for testing. According to bank officials, data mining can help users enhance their business intelligence, such as engagement, classification, or regression analysis, and rely on these capabilities to make targeted promotions for customers who have a higher propensity to purchase banking products, service products, and services. The official believes that the software can feed back high-quality information for analysis and decision-making, and then enter the information into the product's algorithm. Data mining also has customizable capabilities.

Firstar Bank of the United States uses data mining tools to predict when to offer customers what products to offer based on customer spending patterns. The manager of the market research and database marketing department of Firstar Bank found that the public database stores a large amount of information about each consumer. The key is to thoroughly analyze the reasons why consumers are investing in new products, and find a model in the database to enable Find the most suitable consumer for each New Product. The data mining system can read 800 to 1000 variables and assign them values. According to whether the consumer has family property loans, charge cards, certificates of deposit or other savings, investment products, divide them into groups and then use data mining tools to predict When to provide each consumer with which product. Predicting the needs of prospective customers is a competitive advantage for US commercial banks.

RandM Switch Pro

Shenzhen Aierbaita Technology Co., Ltd. , https://www.aierbaitavape.com

This entry was posted in on