Diamonds Should You Buy? Essay

Data Selection (refer to Excel sheets: Scatter1 and Scatter2)
We used Excel to draw a scatter plot to see the relationship between price and carat. The data presented two different groups of data: 1) Diamonds at low prices with low carat and (2) diamonds at high prices with high carat weight. The data with the diamonds at low prices and low carat is irrelevant so we excluded that portion of the data which had prices under \$1000.

Selected data to run the regression
Cut off point of data at Price = \$1000
Variable Grouping and Dummy Variable Assignment

The categorical variables: color, clarity, cut, certification, polish, and symmetry were categorized into necessary categories to help simplify the upcoming regression from a very large number of dummy variables.

1454*1)=\$2,747.91

Semi-Log Model (refer to “Semi-Log” Excel Sheet)
7.3514+0.3412*0.90+0+0+0+0.1815*1+0+0.0420*1+0.0368*1+0+0.0135*1=7.908
e7.908=\$2,718.41

Log-Log Model (refer to “Log-Log” Excel Sheet)
7.6636+0.3300*ln⁡(0.90)+0+0+0+0.1838*1+0+0.0421*1+0.0407*1+0+0.0133*1=7.909
e7.909=\$2,721.14

From the linear, semi-log, and log-log models revolving our variables chosen, it is clear to see that the price of the diamond based on the professor’s diamond specifications is estimated at \$2,747.91 (linear model), \$2,718.41 (semi-log model), or \$2,721.14 (log-log model). However, as the professor chose a sample of diamond wholesalers, we cannot use these figures as the final estimation point, a 95% confidence interval for the price the professor could’ve paid (for a diamond of the same specification) is needed in order to truly determine whether the professor paid a fair price or not.

Determining 95% Confidence Interval of Possible Prices the Professor Could’ve Paid

T-cutoff:=TINVα,n-1=±TINV0.05,229=±1.970377

sn=standard error from regression
Confidence Interval:
x-tn-1, αsn< μ< x+tn-1, αsn

Regression | Sample | Standard | T-Cutoff | Lower | Upper | Exponent | Exponent |
| Mean | Error | (α=0.05)  | Boundary | Boundary | Lower | Upper |
|   |   |   | (LN for logged)  | (LN for logged)  | Boundary | Boundary |
Linear | 2747.9070 | 279.9645 | 1.970377 | \$2,196.27 | \$3,299.54 |   |   |
Semi-Log | 7.9078 | 0.1079 | 1.970377 | 7.695104 | 8.120502 | \$2,197.56 | \$3,362.71 |
Log-Log | 7.9088 | 0.1081 | 1.970377 | 7.695887 | 8.121725 | \$2,199.28 | \$3,366.82 |

The confidence intervals from the three regressions are highlighted above. The price that the professor paid, \$3,100 falls within each of the three confidence intervals, indicating that we can be 95% confident that the professor paid a fair price for his girlfriend’s diamond. However, \$3,100 falls within the upper reaches of the confidence interval, which suggests that he could have got a similar quality diamond for a cheaper price.

