AIM:- To collect the data, find the sampling distribution, standard deviation, and standard error.
The dataset used for this assignment is “Salary of Data Scientists” which has been taken from Kaggle, a data science competition platform and online community of data scientists and machine learning practitioners under Google LLC.
DESCRIPTION OF THE DATASET
It
displays the salaries offered to different posts in the field of data science
across the globe, within the year 2020-2023.
The
dimension of the dataset is 3755 rows X 11 columns.
It
has the following columns:-
· work_year
· experience_level
· employment_type
· job_title
· salary
· salary_currency
· salary_in_usd
· employee_residence
· remote_ratio
· company_location
· company_size
STEPS UNDERTAKEN
1) Data collection- In inferential statistics, data collection refers to the process of gathering information or observations from a population or a sample of that population. This process involves selecting a representative group of individuals, collecting relevant data through various methods (surveys, experiments, observations), and organizing the collected data into a dataset.
2) Sampling Distribution- In inferential statistics, the sampling distribution is a theoretical distribution that describes the likelihood of obtaining different sample statistics from repeated random samples of the same size drawn from a population. It plays a crucial role in making inferences about population parameters based on sample statistics. It helps in hypothesis testing and constructing confidence intervals.
3) Standard Deviation- The standard deviation (often denoted by the symbol σ for a population standard deviation or s for a sample standard deviation) measures how much individual data points in a dataset deviate from the mean. A low standard deviation indicates that the data points tend to be close to the mean, while a high standard deviation indicates that the data points are spread out over a wider range.
σ = sqrt( Σ(xi- μ)² / n) ,where xi are the sample observations, x̅ is the sample mean, n is the sample size
4) Standard error-The standard error (SE) is a measure of the variability or precision of a sample statistic. It quantifies the degree to which a sample statistic, such as the sample mean or sample proportion, is expected to deviate from the true population parameter.
If n is the sample size and x̅ the sample mean then,
Standard error(SE) = σ / √n
The
following section now contains creating sample distributions, one can use
various sources, though the below samples have been generated using the
language R.
The
variable for which random samples have been generated is salary_in_usd and
the sample size(n) chosen is 800.
SAMPLE
DISTRIBUTION WITHOUT REPLACEMENT
Sampling
without replacement refers to the process of selecting elements from a
population in such a way that once an element is chosen, it is not put back
into the population. As a result, each selection reduces the size of the
population for subsequent selections. The sampling distribution without
replacement is particularly relevant when considering finite populations.
The
800 samples generated are:-
[1]
137500 275000 67723 74178 153000
50000 180000 115000 128058 250000 117104 100000 153600 170000 168400
146115 165220
[18] 210000 240000 195400 104611 231250 92350
33808 150000 29944 155000 150000
148500 98506 183600 250000 43809 230000
[35]
83171 129300 24342 21013 119000 115000 106020 72000 106800 120000 116100 191200 119059
118208 160000 208000 75000
[52] 200000 184000 138938 85000 112900 169200 50000
44365 252000 17805 75000 105200 101228 172200 20000 170550 120000
[69] 171600
95000 73900 124234 198200 73546 172200 135000 10354 133766 130000 160000 106800 90320 129300 100706 8000
[86] 140000 129300 203000 96113 190000 145000 94300 200000
75000 141525 135000 376080 180000 130000 110000 185000 185000
[103] 48000 128000
90000 9272 5723 190000 196000 117000 63000 204500 129300 120000 156400 195400
145000 48609 96100
[120]
236900 130000 208450 185900 173000 52533
130000 20171 129300 110000 210000 116000
160000 183500 170000 205000 48289
[137]
250000 80000 250000 191200 164000 42026
40000 200000 141525 69751 252000
170000 70500 162000 135000 169000 30000
[154]
250000 110000 33609 146000 143200 342810
140000 109280 130000 239000 139860 147100
30523 140000 20000 239748 130000
[171]
155000 140000 115440 247500 150000 61566
110600 70000 184000 147100 19522 196200 159200 29751 222200 192000 272550
[188] 6304 136000
84000 225000 179820 86466 184000
131300 187500 135000 195400 110000 216000 115500 126000 180000 176000
[205]
145000 140000 126000 150000 243225 59888
207000 140000 100000 153600 120000 150000
47280 178500 129300 51753 236000
[222] 62000 195000 172600 92000 180000
36773 150000 220000 85700 205000
179975 206699 170000 98506 175950 80000 185000
[239] 95000 236000 142200 275300 120000 24823 135000 110600 125000 144000 201036
138900 169000 100000 75000 100000 110000
[256]
112000 150000 148700 102100 206000 130000 110000 48289 184000 136000 114000 249500 310000 50000 120000
89200 198800
[273]
289800 250000 135000 125000 182000 291500 115000 115000 115360 120000 225000
200000 125000 170000 54685 129300 174500
[290] 49253 100000 130050 159000 85000 130000 199000 170000 150000 191475
130000 120000 129000 180000 57872 175000
184000
[307]
190000 116000 142200 135000 75000 85066
30000 31520 198800 90734 300000 150000 175000 284310 70186 149500
55000
[324]
150000 185000 98506 53192 135000 160000 132100 154000 7799 135000
48289 130000 236600 187000 138750 250000 122500
[341]
105200 150000 121700 165000 165000 119000
79833 130000 198000 63000 200000
185900 165000 205000 221484 120000 130500
[358]
128000 188800 115000 133300 141525 139000 114000 136000 123000 89294
85000 141525 48609 95000 288000 106250 80000
[375]
139500 112000 102000 106900 310000
66837 92700 275000 180000 95000 145000 140000 238000 153600 299500
150000 125000
[392]
210000 139000 225000 113000 260000
28369 37824 226700 18053 213660 239748 95000 139500 120000 280000 105000 250000
[409]
185100 120000 190000 100000 175000 249300 232200 78000 150000
84000 191475 110000 99000 73546 159699
68293 54742
[426]
150000 350000 200000 170000 90320
110000 63000 225000 140000 151902 20000 130000
36773 179305 175000 95000 156400
[443]
216200 175000 135000 203500 92350 128000
110820 135000 136000 127000 167000 183000 124000 180000 127000 133000 115000
[460] 7500
82744 122700 100000 70000 140000
180000 216000 63000 150000 65000
84900 75000 22892 206000 109000 192400
[477]
140000 156400 123700 145000 231250 191475 180000 171000 95000
22809 156600 50000 99000 162500 120250 140000 145000
[494] 55410 103000 196000 100000 130000 153400 75000 242000 123000 47280
92250 156400 154560 153600 174500 229000 107309
[511]
100000 140000 75000 105000 155499 160000
100000 63192 205600 129300 13493 168400 110000 200000 73880 370000 130000
[528]
175000 120000 78000 159000 132000
172200 65000 145000 175100 109400
205000 5409 120000 48000 216000
62000 123648
[545]
276000 138750 200000 110000 95000 130800
250000 155000 84053 235000 90320
70000 182000 275000 145000 286000 405000
[562]
150000 100000 112900 185700 165000 160000 150000 135000 65013 200000 135000 252000 80000 150000 135000 170000 165000
[579]
120250 152000 104663 120000 210000
46809 28476 85500 185000 160000 170000 141525 75000
99000 225000 150000 194000
[596]
216000 196000 80000 52500 130000 115934 141525 100000 153000
143865 128000 90320 85066 180180 203500 201000 150000
[613]
155000 110000 188100 80000 300000 80000 200000 131300 191200 240000 120000 95000
83171 230000 230000 20000 90700
[630]
230000 186000 135000 135000 215300 116450 160000 160000 145000 60000 170500 200000 162500 75000 140400
75000 75116
[647]
200000 226700 129000 95000 126000 116000
135000 20000 15966
48000 185000 218500 345600 135000 100000
95000 58837
[664]
108000 90000 61896 175000 191475 136000 75000
63040 205300 250000 74540 117100
174500 136000 310000 185900 122600
[681]
220000 120000 201450 160000 29453 102100
106500 170000 100000 95000 80036 175000
40663 190200 145000 187200 56100
[698]
149076 158677 99450 136000 220000
252000 52500 150900 55000
81666 76814 130000 73546 186000 130000 154600 86193
[715]
198800 100000 80000 130000 166000
100000 55000 253750 270703 42533 100000
87980 83270 68293
63831 152900 77684
[732]
120000 135000 95000 180180 179820 134000
160000 175000 145000 416000 115000 55000
212200 237000 65000 105000 210000
[749]
125976 53368 106000 75000
38400 136000 143860 124000 144000 112300 150000 135000 185900 13989
73742 112900 100000
[766]
262000 60000 237000 115000 200000 138000
160000 167875 102000 182500 161500 80041
150000 149076 160000 125600 270000
[783]
151800 90000 210000 140000 93700
74000 38000 96000 126000 145900 110000 195000 61467 190000 175000 106800 169000
[800]
247500
Mean = x̅ = Sum of observations/ no. of observations = 139454.1
SD = s =sqrt( Σ(xi- μ)² / n)= 64099.79
SE = σ / √n = 2266.27
SAMPLE
DISTRIBUTION WITH REPLACEMENT
When
sampling is conducted with replacement, it means that each item selected from
the population is returned to the population before the next selection. This
process allows for the same element to be chosen more than once in the sampling
process.
The
800 samples generated are:-
[1]
25000 235000 240000 56536 60000
36773 160000 168400 110000 129300 199000 174500 46178 100000
84053 204500 180000
[18] 150000 111000 19073
59102 130000 165000 174000 112900 133300 140000 5707 202800
82365 90000 61566 110000
90000
[35] 130000 170000 188800 190000 100000 35610 100000
18238 60938 185000 120402 135000
183500 180180 155000 179000 185900
[52] 160000 132320 123700 210000 120000 185000
148700 79833 120160 100000 225000 115000
252000 120000 180000 74540 130000
[69] 102000 140000 150000 124500 184000 199000
162500 175000 160000 73546 131300 200000
112900 130000 108000 125000 90320
[86] 110000 152380 48289 110600 309400 160000 86000
95000 185000 100000 110600 125000 198200 200000 194500 115000 200000
[103]
151800 260000 22809 121700 81500 150000
42923 105380 208000 131300 190000 180000
12000 110000 137400 58000 99100
[120]
132320 130000 86000 221000 89306 110000 155000 72200 120000 300000 196200 80000 120000 110000 40777 36773
109006
[137]
145000 12000 167580 135000 136260
216000 55685 291500 186000 125000 38000 191475 225000 63000 120000 153000 162000
[154]
156400 140000 63810 300240 53654 126000 122500 140100 139000 160000 49253
78990 105000 64000 191765 66100
20000
[171]
185900 30000 297300 200000 250000
247500 66837 145000 116914 153000 174500
120000 85500 123000 120000 136000 13493
[188]
160000 114047 147000 160000 65000 430967
130000 145000 210000 100000 85000 42533 235000 126000 180000 66265 118000
[205] 65488 148700 175000 155000 139500 140000 63000 100000 250000 179975 122900 90000
27317 135000 62000 185000 140000
[222]
156400 109400 289076 288000 113476 169200 120000 70000
90000 164996 158000 261500 163800 239748 220000 191475 106500
[239]
185900 58837 150000 210000 138000 260000
150000 205000 95000 191475 108800 135000
105000 125000 112000 92250 160000
[256]
184000 179500 169200 144100 76814 342300
113000 129300 115934 141300 159699 265000
20000 185000 106020 150000 40189
[273] 13493
75000 128750 140000 247500 214200
75000 75000 130000 185900 48000
88256 73900 280700 86000
88654 55410
[290] 80000
48289 58000 33511
75000 132300 75774 160000 160000
165000 200000 210000 73880 155000
131752 92000 128000
[307] 81000
40000 272000 45760 230000 175000
180000 100000 170000 132320 230000 52500
136000 85066 150000 84053 135000
[324]
124270 17509 169000 61566 170000
92350 142200 75000 73900 160000 174000 75116
75000 120000 123648 48000 50000
[341]
100000 185000 6359 67723 115000 230000 252000 262000 109280
100000 225900 153600 110600 141525 182000
76814 150000
[358]
107309 190000 178600 123700 42197 38400 135000 125000 110600 115934 100000
145000 101228 60000 110000 156400 230000
[375]
120000 185900 37824 94000
20000 175000 13000 100000 180000
195000 153600 150000 204500 165750 115000 151800 61896
[392]
115000 106500 200000 147000 149000 44365
130000 160000 119000 10354 161500 280000
314100 37824 90000 150000 189650
[409] 50000 127075
65488 67723 116000 87000 314100
75000 130000 253200 186000
93918 92350 214618 130000 64000 103691
[426]
110600 68293 120000 243900 145000 350000
175000 250000 237000 210000 195000 250000
85500 70000 180000 120000 110000
[443]
156400 182000 214200 124000 230400 190000 120000 90700 106020
63000 54685 156400 153090
154600 70000 130000 208049
[460]
110000 100000 152000 190000 87980 120000
150000 184000 100000 119059 310000 120000 175000 222200 128000 70000 150000
[477]
120000 180000 190000 132000 160000
65488 29944 109000 195400
119000 50000 126000 170000 102772 70000 150000 130000
[494]
123700 82000 100000 140000 60000 235000 120000 13000 140000 179400 110037 81666 108000
70139 283200 200000 63831
[511]
275300 188700 100000 105500 73546
205300 53654 120000 63000
68293 95386 216000 70186 265000 110000 99000
93700
[528] 12877 106020 160000 185900 124000 126000
128000 190000 120000 170000 260500 73900
240000 245000 250000 87000 78000
[545] 73880 171250 100000 155000 149040 121523 90000 185900 140000 20171 160000 114000 175000 200000 108800 20000
20000
[562]
136620 160000 105000 250000 186000 155000
33808 179975 145000 64200 128875
190000 105000 252000 153600 160000 150000
[579]
140000 139500 259000 220000 100000 145000 175000 200000 133300 162000 235000
234100 145000 170000 90734 160000 208775
[596]
280000 89294 207000 104000 257000
258000 78000 84053 135000 135000 160000 57723 275000
50432 136000 100000 10354
[613]
250000 54094 17805 108000 185000 71786
23000 125000 94564 200000 75020 110000 105000 98000
49253 110000 90700
[630] 60000 107250 131300 108800 105200 121700
180000 110600 57872 160000 210000 82528
81666 142000 291500 110000 18907
[647]
106000 236900 160000 195400 129400 210000 185900 12608
75000 237000 110000 46178 190000
175000 52533 120000 155000
[664]
160000 50000 104000 214618 204500
119500 72914 205000 70000 121500 106020 188100 41689 127000 325000 119059 42026
[681] 95000
7799 144000 170000 141525 94500
185900 252000 150000 135000 115000 90700
130000 55475 196000 160000 145000
[698] 5723 100000 174500 130000 204500 171600 50000 110000 129300 155000 84053 205300 127075 340000 129300 150000
100000
[715]
167580 67723 151410 297300 250000
150000 65000 135000 100000 222200 97218 225000 130000 185900 21013 112000 194500
[732] 48289 124740 191475 136000 231250 86193 179820 150000 185900 250000 95000 248100 191475 145000 80000 135000 130000
[749] 63192 185900 120000 170000 95000 160000 120000 160000 200000 92350 135000 170000 142200 250000 130000 69741
13400
[766]
231250 194000 160000 100000 112000 126000 190000 150000 185900 184000 12000 160080
72946 85000 145000 48289
63000
[783]
152900 109006 130000 142200 205000 201000
75000 49268 70000 120402
85000 38000 142200 128000 30000 192400 210000
[800]
192564
Mean = x̅
= Sum of observations/ no. of
observations = 135492.7
SD = s = sqrt( Σ(xi- μ)² / n) = 64703.85
SE = σ / √n =2287.627
No comments:
Post a Comment
Keep it concise