CPS844 - Data Mining Stats

Data Mining Midterm Review

You Got
0%
Avg Score
4%
Avg Friend Score
Score
0/93
Timer
20:00
0 Plays Today
Question Answer % Correct
What is one way distance can be measured (Given an instance a)?Euclidean distance of a.
40.9%
What is the laplace estimator set to?1.
31.8%
What are the disadvantages of using Linear Classification via Regression?membership function is not really probability, least-squares assumption of normally distributed errors, with constant variance not valid given class values of only 0 and 1.
31.8%
What is distance given the attribute-value of a nominal instance?The distance measure is 1
31.8%
What method is used for determining cyber criminals?A & B
27.3%
What is Data Mining?Automated methods for finding and describing patterns in large amounts of data, as well as preprocessing the data.
22.7%
What is the goal of Data Mining?To understand the data and make predictions given new data.
22.7%
What are the goals of patterns?Patterns allow us to make predictions on new data.
22.7%
What is stratified?When subdividing into ten folds, try to make the same proportion of instances of a given class in each fold as the original data.
22.7%
Will the accuracy be better on new data?No. Pruning only removes non-significant data.
22.7%
In a multiclass data file, is modularlity preserved?No.
18.2%
What is the Divide-and-Conquer Decision Tree?An algorithm to choose an attribute based on its purity, that is nodes where all instances have the same class.
13.6%
What are some variations of the ball tree?In place of letting a single nearest-neighbour decide, take several and use a voting system | In place of storing each training instance, store the range of each attribute for each class and count the ranges it satisfies for each class
13.6%
Decision trees can be easily converted to what kind've set?Rule set
9.1%
As per this researchunknown items can be interesting
9.1%
What was the purpose of the research?Discovering knowledge regarding the collective behavior of missing values.
9.1%
What are some examples of uses in DM & ML?selection of best embryos in human in-vitro fertilization, selection of 700 attributes from culling cows, click streams from big data of customer profiles from card swipes
4.5%
What is Machine Learning?Techniques and algorithms used in Data Mining to analyze data and generate models that represent the patterns.
4.5%
What are the different types of patterns?It could be a 'black box', or a transparent box showing structural patterns.
4.5%
Example = Instance (T/F?)True
4.5%
Feature = Attribute (T/F?)True
4.5%
What are association rules?Association rules are created in order to process a decision, by linking any attributes.
4.5%
What is pruning?A technique to reduce the size of a decision tree.
4.5%
What adjusts the priori probabilities?The training set.
4.5%
What type of model/rule works best with nominal attributes?Decision trees and rules
4.5%
What is linear regression best applied to?Numeric prediction, when the attributes are numeric and the X value has a linear dependence on the attributes?
4.5%
What is multiresponse linear regression dependent on?Linear separability of two classes.
4.5%
What is the distance given the attribute-value is missing?The distance measure is 0
4.5%
How can we correct the disadvantage of the kD-tree?Use hyperspheres and a ball tree.
4.5%
What are some disadvantages of nearest-neighbour instance-based learning?Each attribute has the same influence (as in NB), and the noisy data can skew the model, slow if k > 10 for kD-tree, but the ball tree can handle thousands of attributes
4.5%
What are the four types of clusters?Non-overlapping clusters, fuzzy clusters, overlapping clusters, hierarchical clusters
4.5%
Question Answer % Correct
Are the physical distance between the sensor and nearest structural property?Static Attributes
4.5%
What are some examples of structural patterns?If-then rules, decision trees
0%
Given some .arff data file, what are its instances?Each line in a data file is an instance.
0%
Given the contact-lens.arff data file, what are some of the attributes?age, spectacle-prescription, astigmatism, tear-prod-ra, contact-lenses
0%
Given the contact-lens.arff data file, what are its class attributes?contact-lenses
0%
How do you set the class in weka explorer?It is the last attribute in the .arff data file. In preprocess, there is a tab 'Class' Above the histogram. In Classify, the class tab is right above the start button.
0%
What is learning?Purposefully changing to improve performance.
0%
How do Machine Learning algorithms work?Take data as input and produce structural patterns as output. Patterns can be used for prediction or human understanding. Change is resulting model representing the pattern, and improved performance is measured by accuracy of predictions based on model.
0%
Value of Attribute?Number or Symbol
0%
What is tenfold stratified cross-validation?it is a way to process a data by dividing it into ten equal parts, 9 parts for training, and 1 part for testing.
0%
Why not use the training set for the testing data?For example if a class variable that was missing in the training set was in the testing data, that value would be more prevalent in the results, causing skewed predictions.
0%
What are the advantages of pruning?Simplicity and speed.
0%
In decision trees, what is one way to avoid overfitting?Pre-pruning or Post-pruning.
0%
What is Naive Bayes' rule of conditional probability?P(H / E) = P(E / H) * P(H)
0%
What is the prior prbability of the hypothesis?P(H)
0%
What if you know the distribution for an attribute is abnormal, and follows the probability density function?Changes smaller attributes by duplication, to increase
0%
What can the priori probabilities be set to?Anything, but are usually unknown.
0%
What if you don't wnat to guess the PDF, what would you do for the attribute temperature?If we have a numerical analysis vs a nominal one, we apply the probability density function for a normal distribution and calculate accordingly.
0%
If antecedent then X? (What is X?)Consequent
0%
What are some advantages of rules over decision trees?Rules are simpler representations, rules are more modular.
0%
What are some disadvantages of rules over decision trees?Trees are unambiguous, rule sets can be ambiguous (which more than one rule fires, or when no rule fires)
0%
In a multiclass data file, what is generated by machine learning algorithms?A decision list.
0%
What is a decision list?A list of rules that must be executed in order
0%
What are the top languages used for analytics, data mining, and data science?R (61%)
0%
What is the highest level analytitic and data mining software used in the past 12 months on a real project?RapidMiner (39%)
0%
What type of model/rule works best with numeric attributes?Linear models
0%
What is a linear model?A straight line (x = w0 + w1a1)
0%
What is the key characteristic that makes linear model equations?The dot product.
0%
What is linear regression?A method used in linear models to find values for components, eventually minimizing the sum of square errors to get 'least squares'.
0%
How can we use linear regression to predict classes?Build a model, and assign a class value as 1 when it is really the class value, and 0 otherwise.
0%
What are the advantages of using Linear Classification via Regression?Simple approach and produces good results
0%
Question Answer % Correct
What are the different ways to apply linear classification?Linear Regression, Logistic Regression, Perceptron
0%
How does logistic regression differ from linear regression?It approximates weight by maximizing the log-likelihood versus minimizing SSE.
0%
How does perceptron differ from the other types of linear classification?It skips assigning probabilities and simply finds a hyperplane that separates the two classes, provided the two classes are linearly separable.
0%
How can we find the nearest neighbours effeciently?kD Trees
0%
What is the complexity of a straight-forward search?O(n)
0%
What is a disadvantage of the kD-tree?Rectilinear regions
0%
What are some advantages of nearest-neighbour instance-based learning?Simple, and often works well
0%
What is a clique?A complete subgraph
0%
This research is based on recommendingUnknown but interesting items
0%
In this research User Similarity Score representsSimiliarity between their actions on items
0%
One of the concerns that related to the sustainability problem isIncreasing the energy cost
0%
One of the good steps towards the green computing goal isNew recycling programs
0%
The low server utilization is a problem associated withCloud Computing
0%
Why time series data-sets must be reduced (approximated)?All above
0%
What is the main problem of DFT (Discrete Fourier Transform) reduction family?Effective but not for transient or fast evolving time-series.
0%
What is the difference between clustering (pattern discovery) and classification tasks?In classification we have classes and we try to label new instances to each class while in clustering there are no clusters and we want to cluster the instances
0%
Which is the most effective way to maintain cache consistency in mobile environments?Invalidation Checks
0%
After generating caching rules, any rule that meets the following condition is added to the rule setHaving a support level and a confidence value larger than a user defined threshold
0%
Which concept does SACCS use to track changes to data objects in cache?Flags
0%
What is “Write-print”?A unique texting pattern or style in an individual person’s written work.
0%
How is author determined by Authorship Identification in this presentation?Compare an anonymous email with every suspect’s write-print. The suspect having the most matching patterns with the email is the author.
0%
____ is a system that provides a recommendation, prediction, opinion, or user-configured list of items that assists the user” is a definition ofRecommender system
0%
Hamlet system is designed to minimize the purchase price of airplane tickets by incorporate time into the recommendation process is an example ofTemporal Recommenders
0%
Content-based recommender system that incorporate information retrieval methods are frequently used to satisfy ephemeral needs fromStatic databases
0%
The general design principle of Hadoop is built onParallel computing
0%
The two main phases of Pattern Matching areProfiling and then Matching
0%
The theory of “Speaker Recognition” is based onThe proven concept that if two voices are “significantly similar” under the same conditions and combinations; meaning its most likely to be the same person.
0%
Are considered nonlinear statistical data modeling tools where the complex relationships between inputs and outputs are modeled or patterns are found.Artificial Neural Network
0%
Is the temperature, humidity and other conditions which occupants experience in a building and dependent on human activities, insulation material, mean radiant temperature, humiditThermal Comfort
0%
Why did the study use Linear Discriminant Analysis (LDA)?Due to the unpredictability of Layered back-propagation neural networks (BPNN), LDA was used as a benchmark.
0%
What are some of the approaches for handling missing data?Excluding records with missing values, using generic 'unknown', and using imputation techniques.
0%
Quiz Playlist
Details
Classic: Type in answers that appear in a list
Last Updated: Feb 16, 2017

Quiz Scoreboard

Comments

May contain spoilers

More to Explore

You Might Also Like...

Today's Top Quizzes in Science

Browse Science

Today's Top Quizzes in Computer

Browse Computer

More By:
tombobadil

Quiz Plays Rating Category Featured Created
22 Science Mar 10, 2014

Go to Creator's Profile

Score Distribution

Your Account Isn't Verified!

In order to create a playlist on Sporcle, you need to verify the email address you used during registration. Go to your Sporcle Settings to finish the process.

Report this User

Report this user for behavior that violates our Community Guidelines.

Details: