共起ネットワーク:抽出語またはコードを用いて、出現パターンの似通ったものを線で結んだ図、すなわち共起関係を線(edge)で表したネットワークを描く機能 ーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーー […]
Author: anesth
Advanced Analytics with Spark #7 GraphX #1
Advanced Analytics with Spark #7に入る。共起ネットワークの構築。ScaleによるXMLのハンドリングが含まれる。 —————&# […]
Advanced Analytics with Spark #4 K-Means Clustering #1
K-Means Clusteringに進む。 ———————————— […]
Advanced Analytics from Spark #4 Decision Tree & Random Forest #2
Decision Treeに戻る。 ーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーー
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
MacBook-Pro-5:spark-2.3.1-bin-hadoop2.7 $ ${SPARK_HOME}/bin/spark-shell --master local[*] --driver-memory 6g 2018-10-09 12:49:43 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://************:4040 Spark context available as 'sc' (master = local[*], app id = local-1539056991319). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.1 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181) Type in expressions to have them evaluated. Type : help for more information. scala> import org.apache.spark.mllib.linalg._ import org.apache.spark.mllib.linalg._ scala> import org.apache.spark.mllib.regression._ import org.apache.spark.mllib.evaluation._ scala> val rawData = sc.textFile("hdfs://localhost/user/*******/ds/covtype.data") rawData: org.apache.spark.rdd.RDD[String] = hdfs://localhost/user/*******/ds/covtype.data MapPartitionsRDD[7] at textFile at <console>:33 scala> rawData.first res9: String = 2596,51,3,258,0,510,221,232,148,6279,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5 scala> val data = rawData.map { line => | val values = line.split(',').map(_.toDouble) | val featureVector = Vectors.dense(values.init) | val label = values.last - 1 | LabeledPoint(label, featureVector) | } data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint] = MapPartitionsRDD[8] at map at <console>:34 scala> data.first res10: org.apache.spark.mllib.regression.LabeledPoint = (4.0,[2596.0,51.0,3.0,258.0,0.0,510.0,221.0,232.0,148.0,6279.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]) scala> val Array(trainData, cvData, testData) = | data.randomSplit(Array(0.8, 0.1, 0.1)) trainData: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint] = MapPartitionsRDD[9] at randomSplit at <console>:35 cvData: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint] = MapPartitionsRDD[10] at randomSplit at <console>:35 testData: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint] = MapPartitionsRDD[11] at randomSplit at <console>:35 scala> trainData.cache() res11: trainData.type = MapPartitionsRDD[9] at randomSplit at <console>:35 scala> cvData.cache() res12: cvData.type = MapPartitionsRDD[10] at randomSplit at <console>:35 scala> testData.cache() res13: testData.type = MapPartitionsRDD[11] at randomSplit at <console>:35 |
[crayon-67448e9f8951e3 […]
MLST: eBURST
研究プロジェクトの関連で、ちょっと横道に逸れます。MLSTとeBURST Studyについて、復習を進めます。 ーーーーーーーーーーーーーーーーーーーーーーーーーーーーー eBURST3のサイトhttp://eburst […]
Advanced Analytics from Spark #4 Decision Tree & Random Forest #1
Advanced Analytics with Spark Chap-04 に進もう。 ————————— […]
Advanced Analytics from Spark #3 協調フィルタリング #4
大急ぎで、協調フィルタリングALSを動かしてみたが、もう少し中身のScalaコードを吟味してみよう。 ーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーー 兎にも角にもAUC以下の部分が難解であ […]
Advanced Analytics from Spark #3 協調フィルタリング #3
2つ前のブログの続きだが、AUCの計算をspark-shellから行うようにする。 ーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーー Advanced Analytics […]
Advanced Analytics from Spark #3 協調フィルタリング #2
Advanced Analytics with Spark、協調フィルタリングの続き: 前回のレコメンデーションの問題点は、すでにユーザーが再生したことのあるアーティストが選ばれている可能性があり、改善の余地があるとのこ […]
Advanced Analytics from Spark #3 協調フィルタリング #1
台風24号ヒット! この莫大な破壊的エネルギーをプラス方向に利用できないのだろうか? こちらはとにかくSparkするということで、Advanced Analytics from Spark第二章:交互最小二乗法を用いた協 […]