Correlation - Statistics

 


 introduced to various correlation statistical methods, particularly focusing on the Pearson Correlation method. Here's a summary of the key points covered:


Pearson Correlation Method:


Pearson correlation is a statistical method used to measure the strength and direction of the linear relationship between two continuous numerical variables.

It provides two values: the correlation coefficient and the p-value.

The correlation coefficient ranges from -1 to 1, where:

Close to 1 implies a large positive correlation.

Close to -1 implies a large negative correlation.

Close to 0 implies no correlation between the variables.

The p-value indicates the certainty of the correlation coefficient calculated:

A p-value less than 0.001 suggests strong certainty.

A p-value between 0.001 and 0.05 suggests moderate certainty.

A p-value between 0.05 and 0.1 suggests weak certainty.

A p-value larger than 0.1 suggests no certainty of correlation.

Strong correlation is indicated when the correlation coefficient is close to 1 or -1, and the p-value is less than 0.001.

Calculation of Pearson Correlation:


The Pearson Correlation can be easily calculated using statistical packages like SciPy stats.

Interpreting Correlation Results:


An example was provided, analyzing the correlation between horsepower and car price.

The correlation coefficient was approximately 0.8, indicating a strong positive correlation.

The small p-value (< 0.001) suggested strong certainty about the correlation.

Creating a Correlation Heat Map:


All variables were considered to create a heat map indicating the correlation between each variable.

The color scheme indicated the Pearson correlation coefficient, providing insight into the strength of the correlation between variables.

A diagonal line with a dark red color indicated highly correlated variables, which is expected as it represents the correlation of each variable with itself (which is always 1).

This correlation heat map provides a comprehensive overview of how different variables are related to one another, particularly in relation to car price.





Comments

Popular posts from this blog

Common cybersecurity terminology

Cheat Sheet: Plotting with Matplotlib using Pandas

Introduction to security frameworks and controls