Math
Statistics
Statistics Exam I
Terms in this set (66)
numerical (quantitative) data
numbers
categorical data
data that doesn't contain numbers
sample statistics
descriptive measures, summaries o samples taken from populations. We get these from actual data that we've taken
sample mean
x bar, used for numerical data
sample proportion
p hat, used for categorical data
paramater
a numerical survey of a population
population mean
mu, used for numerical data. We don't know it in most cases.
population proportion
P, used for categorical data. We don't know it in most cases
confidence interval
the range of values within which a population parameter is estimated to lie
hypothesis test
a statistical method that uses sample data to evaluate a hypothesis about a population
simple random sampling
every member of the population has an equal probability of being selected for the sample
Data
information we gather with experiments and with surveys
statistics
the art and science of designing studies to get data, analyzing the resultant data, and translating data into knowledge and understanding
Statistical Methods
design, description, inference
design
planning how to obtain data
description
summarizing the data
inference
making decisions and generalizing conclusions from that data
population
all subjects of interest
sample
subset of the population for whom we have data
subjects
the entities we observe in a study
discrete data
data in which the observations are restricted to a set of values that possess gaps
continuous data
data that can take on any value within some interval. Usually rounded
nominal
purely categorical, where Eno order or ranking can be imposed (categorical data)
ordinal
categories having an order associated with them, but no uniform distance between categories. Must hav a very logical order (ie. strongly agree, agree, neutral, disagree, strongly disagree). For categorical data.
interval
data is ordered and the difference between values is meaningful, but there is no true zero point. Ratios aren't helpful (ie. IQ). For numerical data.
ratio
interval, plus has a meaningful zero point. For numerical data.
time series data
measures that same variable/characteristics at different points in time.
cross-sectional data
measures a variable across different units at approximately the same point in time.
frequency distribution
table summarizing data organized into classes which lists each lass and the number of observations in each class
mutually exclusive
no overlap
exhaustive
have enough categories so every observation is accounted for
relative frequency
number of observations in a class/total number of observations. Proportion of observations in a class.
cumulative frequency
sum of frequencies in a given class and all preceding classes
cumulative relative frequency
proportion of observations in a given class and all preceding classes (cumulative frequency/total # of observations)
Bar chart
a simple graphical display in which the length of each bar corresponds to the class frequency, relative frequency, or percentage of observations
Pie Charts
quickly illustrates relative frequency, but frequency sometimes is harder to detect best for nominal data
Bell Shaped data
usually most desirable statically
skewed to the right
long tail is to the right
skewered to the left
long tail is to the left
Time sequence plot
columns are years and bars represent data values in different years
Time series graph
dots represent data values for years and line connects dots to form time trend. Better represent rates of change
mean
arithmetic average
median
the middle values in data that has been arranged in numerical order used for quantitative and ordinal data
mode
the most frequently occurring value
median<mean
right skewed distribution
mean=median
symmetric distribution
median>mean
left skewed distribution
dispersion
how spread out the data is
mean absolute deviation
sum of the absolute values of the deviations
sample variance
the average squared devotions from the mean
standard deviation
the square root of the variance
range
the maximum value-the minimum value
Z-score
how many standard deviations away from the mean a given observation is
The Pth percentile
a value such that P percent of the data is less than or equal to the value and (100-P) percent of the data is greater than or equal to that value
IQR
interquartile range (Q3-Q1)
five number summary
minimum, Q1, median, Q3, maximum
dependent variable
the outcome variable on which comparisons are made
independent variable
defines the values of groups to be compared
association
this exists between two variables if a particular value for one variable is more likely to occur within certain values of the other variable
contingency table
used to display two categorical variables
scatter plot
a graph of the points for a number of observations for two variables.
correlation coefficient
a single number which indicates the direction and strength of the linear relations between two variables
regression
an equation representing a straight line that summarizes the relationship between variables
positively associated
high x=high y, low x=low y
negatively associated
high x=low y or vice versa
correlation
describes the strength of the linear association between two variables
