[125]简报:TalkingData产品分析 - 2016Q2

TalkingData产品 之 移动观象台:
1.终端指数
品牌
设备
分辨率
操作系统
运营商
网络
2.数据报告
3.用户趋势
用户趋势:智能终端累积量发展趋势
区域热度
用户喜好
4.人迹地图

数据报告 之 2016清明时节下江南热点分析
说明:本报告主要分为三个部分:数据说明、用户人群画像、用户线下行为
其中:

用户人群画像
用户来源人群分布(常居地->目的地)
用户旅游热点城市
节前/节日/节后用户活跃度
男女比例
年龄比例
用户喜好(70后、80后、90后)
用户线下行为
生活轨迹:休闲消费/夜生活
景区热力图:节中期间/节日前后

TalkingData的同类产品:

  1. Flurry
  2. 友盟
  3. Google Analytics

参考:

  1. http://www.talkingdata.com/
  2. http://www.talkingdata.com/index/#/device/mob/zh_CN
  3. http://www.talkingdata.com/index/#/datareport/-1/zh_CN
  4. http://www.talkingdata.com/index/#/profile/usertrend/zh_CN
  5. http://www.talkingdata.com/index/behaviormap/heatMapOverlay.jsp
  6. http://www.talkingdata.com/index/files/2016-04/1461050046352.pdf
  7. http://www.domarketing.org/html/2012/yd_0508/4311.html

<完>

[124]简报:机器学习 & 深度学习 & 人工智能 & BI & 数据挖掘 - 2016Q2

机器学习 & 深度学习 & 人工智能

分布式开源及新兴平台工具:

  1. Apache Mahout
  2. Spark MLib
  3. Flink
  4. H2O
  5. Dato

深度学习开源实现:

  • Torch
  • Theano
  • Caffe
  • Petuum
  • Deep4J
  • H2O

从谷歌(Google)旗下DeepMind公司阿尔法围棋(AlphaGo)挑战围棋冠军李世石之后,人工智能进一步发酵,变成热门方向和热门关键词,各公司分别发布新闻,和开源相关产品。其中:

  • Microsoft: DMTK
  • Google: TensorFlow、SyntaxNet
  • Apple:
  • Facebook: Torch
  • Amazon:
  • OpenAI: OpenAI Gym
  • Yahoo!:
  • Baidu:
  • IBM: SystemML

数据挖掘

商业:

  1. SPSS

传统开源工具(单机):

  1. Weka
  2. R
  3. Python

参考:
1.《美团大数据平台架构实践》
2.互联网金融时代下机器学习与大数据风控系统
3.Dato
4.H2O
5.大数据+机器学习+平台,所以Dato拿了1850万美元B轮融资

  1. http://mt.sohu.com/20160513/n449235974.shtml
  2. http://www.infoq.com/cn/articles/h2o-ai-chief-architect-talk-deep-learning
  3. http://cmsoft.10086.cn/info.html?key=348

[123]简报:开源流计算框架 - 201605

商业流计算框架:

  1. IBM StreamBase

开源流计算框架:

  1. Spark Streaming
  2. Apache Flink
  3. Apache Storm (Twitter)
  4. Apache Heron
  5. Apache Apex
  6. Yahoo! S4
  7. Apache Samza (Linkedin)
  8. Borealis(Brandeis University、Brown University、MIT)

参考:

  1. http://www.infoq.com/cn/articles/overview-of-apache-streaming-technology?utm_campaign=rightbar_v2&utm_source=infoq&utm_medium=articles_link&utm_content=link_text
  2. http://www.dataguru.cn/article-9532-1.html?utm_source=tuicool&utm_medium=referral
  3. 18款顶级开源与商业流分析平台推荐与详解

[122]大数据之电信运营商案例

美国AT&T

http://www.processexcellencenetwork.com/business-transformation/articles/the-goldmine-in-our-pockets-how-telecommunications
https://en.wikipedia.org/wiki/Data_monetization
http://www.footfallchina.cn/index.php/insights/

西班牙电信 智慧足迹(smart steps)

http://dynamicinsights.telefonica.com/blog/488/smart-steps-2
http://dynamicinsights.telefonica.com/wp-content/uploads/2015/04/O2-Retail-Smart-Steps-Case-Study.pdf
http://dynamicinsights.telefonica.com/blog/1634/smart-steps-saved-newark-from-commuter-lock-down

美国威瑞森(Verizon) Precision Market Insights

http://www.verizonwireless.com/featured/precision/

参考:

[121]敏捷之如何切分用户故事

敏捷Scrum框架一览

这是一张图片

用户故事切分

这是一张图片

用户故事切分三步法:
第一步 准备待切分的故事
通过提出问题准备故事:

1
2
原则是待切分的大故事是否满足INVEST*原则(“较小的”这一点可以除外)?
故事大小是团队速率的 1/10 到 1/6 吗?

第二步 运用切分模式

1
2
3
4
5
6
7
8
按工作流程步骤切分
延迟性能优化
按简单/复杂切分
按主要工作切分
按不同界面切分
按不同类型的数据切分
按不同的业务规则切分
按操作切分

第三步 评估切分
通过对切分提出以下问题进行评估:

1
2
3
4
5
新故事的大小大致相等吗?
每个故事大概是团队速率的1/10到1/6吗?
每个故事都满足INVEST原则吗?
有可以降低优先级或删除掉的故事吗?
有没有明显的故事先开始,从而可以获得早期的价值、认知或风险降低等?

注:INVEST - 故事应该是:
独立的
可商谈的
有价值的
可估算的
较小的
可测试的

参考:

  1. Introduction to Agile
  2. New Story Splitting Resource
  3. How to Split a User Story

[120]学习《Thoughtworks技术雷达201604》

笔记如下:

  • 技术雷达从两个维度对技术进行评估,一个维度是技术归类的四个象限,包括:技术、平台、工具、语言及框架,另一个维度是反映所持有的态度的四个环,依次为:采用、试验、评估、暂缓。

    1
    2
    3
    4
    采用:我们强烈主张业界采用这些技术。如果适用我们的项目,我们会采用他们。
    试验:值得追求。重要的是理解如何建立这种能力。企业应该在风险可控的项目中尝试该项技术。
    评估:为了确认它将如何影响你所在的企业,值得做一番探究。
    暂缓:谨慎推行。
  • 在技术理念上需要关注:

    1
    2
    3
    4
    5
    6
    Products over projects
    BFF - Backend for frontends
    Data Lake
    Event Storming
    QA in production
    Reactive architectures
  • 平台部分关注:

    1
    2
    3
    Docker
    Apache Mesos
    Kubernets
  • 工具部分关注:

    1
    2
    3
    Consul
    Apache Kafka
    Zipkin
  • 语言和框架部分关注:

    1
    2
    3
    4
    5
    6
    7
    8
    ES6
    React.js
    Spring Boot
    Swift
    Dagger
    Dapper
    Ember.js
    Reactive Native
  • 暂缓部分关注:

    1
    2
    1. Application Servers
    2. Jenkins as a deployment pipeline

<完>

[117]学习《Real-Time Event Streaming What Are Your Options》

一个典型的流式架构(a typical streaming architecture)

这是一张图片

流式架构三组件

  • A producer: 与数据源相连的软件系统。生产者从数据源采集、转换、过滤、聚合、增强之后发布事件数据到流式系统中。
  • The streaming system: 接受生产者发布的数据,持久化这些数据,然后可靠的将数据分发给消费者。
  • Consumers: 从流中订阅数据,并操作,或者分析这些数据。

技术选型(Options)

Producers:

  • Apache Flume
  • StreamSets Data Collector

Streaming System

  • Apache Kafka
  • MapR Streams

Comsumers(Processing)

  • Spark Streaming
  • Apache Storm
  • Apache Flink
  • Apache Apex

参考:

1.Real-Time Event Streaming What Are Your Options?
2.《Streaming Architecture》
3.Stream-based Architecture
4.Streaming Architecture: Ideal Platform for Microservices

[116]简报:大数据Hadoop动态 - 2016Q2

Apache Storm

Storm发布1.0.0版本,关键特性:

HDP 2.4.2版本中APACHE SPARK & APACHE ZEPPELIN的增强

  • Certified SparkSQL with ODBC (ODBC driver available from Hortonworks).
  • Bug fixes in Spark Oozie action for a Kerberos enabled cluster.
  • Spark Streaming with Apache Kafka support in a Kerberos enabled cluster.
  • SparkSQL & ORC performance improvements.
  • Final technical preview of Apache Zeppelin that includes Kerberos support, LDAP Authentication, and identity propagation.
    http://hortonworks.com/blog/apache-spark-apache-zeppelin-whats-coming-in-hdp-2-4-2/

Cloudera Engineering

How-to: Detect and Report Web-Traffic Anomalies in Near Real-Time
http://blog.cloudera.com/blog/2016/06/how-to-detect-and-report-web-traffic-anomalies-in-near-real-time/
Best Practices for Enterprise Data Hub Encryption
http://blog.cloudera.com/blog/2016/06/best-practices-for-enterprise-data-hub-encryption/
How-to: Analyze Fantasy Sports with Apache Spark and SQL (Part 2: Data Exploration)
http://blog.cloudera.com/blog/2016/06/how-to-analyze-fantasy-sports-with-apache-spark-and-sql-part-2-data-exploration/
How-to: Analyze Fantasy Sports using Apache Spark and SQL
http://blog.cloudera.com/blog/2016/06/how-to-analyze-fantasy-sports-using-apache-spark-and-sql/
https://spark-summit.org/2016/schedule/
Guide to Configuring Apache Impala (incubating) for HA with F5 BIG-IP
http://blog.cloudera.com/blog/2016/05/guide-to-configuring-apache-impala-incubating-for-ha-with-f5-big-ip/
http://www.cloudera.com/documentation/other/reference-architecture/PDF/Impala-HA-with-F5-BIG-IP.pdf
Multi-node Clusters with Cloudera QuickStart for Docker
http://blog.cloudera.com/blog/2016/08/multi-node-clusters-with-cloudera-quickstart-for-docker/
Livy, the Open Source REST Service for Apache Spark, Joins Cloudera Labs
http://blog.cloudera.com/blog/2016/07/livy-the-open-source-rest-service-for-apache-spark-joins-cloudera-labs/
Untangling Apache Hadoop YARN, Part 4: Fair Scheduler Queue Basics
http://blog.cloudera.com/blog/2016/06/untangling-apache-hadoop-yarn-part-4-fair-scheduler-queue-basics/
New Study: Evaluating Apache HBase Performance on Modern Storage Media
http://blog.cloudera.com/blog/2016/06/new-study-evaluating-apache-hbase-performance-on-modern-storage-media/
https://software.intel.com/sites/default/files/managed/95/0d/Optimize%20Hadoop%20Cluster%20Performance%20with%20Various%20Storage%20Media%20334463-001US.pdf
How-to: Process and Index Medical Images with Apache Hadoop and Apache Solr
http://blog.cloudera.com/blog/2016/05/how-to-process-and-index-medical-images-with-apache-hadoop-and-apache-solr/
How-to: Configure SAP HANA with Apache Impala (incubating)
http://blog.cloudera.com/blog/2016/05/how-to-configure-sap-hana-with-apache-impala-incubating/
How-to: Build a Prediction Engine using Spark, Kudu, and Impala
http://blog.cloudera.com/blog/2016/05/how-to-build-a-prediction-engine-using-spark-kudu-and-impala/
How-to: Improve Apache HBase Performance via Data Serialization with Apache Avro
http://blog.cloudera.com/blog/2016/05/how-to-improve-apache-hbase-performance-via-data-serialization-with-apache-avro/
Inside Santander’s Near Real-Time Data Ingest Architecture (Part 2)
http://blog.cloudera.com/blog/2016/05/inside-santanders-near-real-time-data-ingest-architecture-part-2/
Inside Santander’s Near Real-Time Data Ingest Architecture
http://blog.cloudera.com/blog/2015/08/inside-santanders-near-real-time-data-ingest-architecture/
Apache Impala (incubating) in CDH 5.7: 4x Faster for BI Workloads on Apache Hadoop
http://blog.cloudera.com/blog/2016/04/apache-impala-incubating-in-cdh-5-7-4x-faster-for-bi-workloads-on-apache-hadoop/
New in Cloudera Manager 5.7: Cluster Utilization Reporting
http://blog.cloudera.com/blog/2016/04/new-in-cloudera-manager-5-7-cluster-utilization-reporting/
Cloudera Enterprise 5.7 is Released
http://blog.cloudera.com/blog/2016/04/cloudera-enterprise-5-7-is-released/
How-to: Use Impala and Kudu Together for Analytic Workloads
http://blog.cloudera.com/blog/2016/04/how-to-use-impala-and-kudu-together-for-analytic-workloads/
Quality Assurance at Cloudera: Running/Upgrading to New Releases on Our Own EDH Cluster
http://blog.cloudera.com/blog/2016/04/quality-assurance-at-cloudera-runningupgrading-to-new-releases-on-our-own-edh-cluster/
Quality Assurance at Cloudera: Fault Injection and Elastic Partitioning
http://blog.cloudera.com/blog/2016/04/quality-assurance-at-cloudera-fault-injection-and-elastic-partitioning/
Benchmarking Apache Parquet: The Allstate Experience
http://blog.cloudera.com/blog/2016/04/benchmarking-apache-parquet-the-allstate-experience/

Cloudera Vision

How GoPro uses Apache Hadoop in the Cloud
https://vision.cloudera.com/gopro-hadoop-cloud/
SQL-on-Apache Hadoop – Choosing the right tool for the right job
https://vision.cloudera.com/sql-on-apache-hadoop-choosing-the-right-tool-for-the-right-job/
New Open-Source Service Enables Apache Spark Development
https://vision.cloudera.com/new-open-source-service-enables-apache-spark-development/
Tuning Hive on Spark
http://www.cloudera.com/documentation/enterprise/latest/topics/admin_hos_tuning.html
http://www.cloudera.com/documentation/enterprise/latest/topics/admin_performance.html
Faster Batch Processing with Hive-on-Spark
https://vision.cloudera.com/faster-batch-processing-with-hive-on-spark/
Beyond ETL: Real-time, Streaming Architectures
https://vision.cloudera.com/beyond-etl-real-time-streaming-architectures/
The One Platform Initiative Delivers
https://vision.cloudera.com/the-one-platform-initiative-delivers/

Hortonworks

Rack Awareness
https://community.hortonworks.com/articles/43057/rack-awareness-1.html
Rack Awareness Series 2
https://community.hortonworks.com/articles/43164/rack-awareness-series-2.html
Disaster recovery and Backup best practices in a typical Hadoop Cluster :Series 1 Introduction
https://community.hortonworks.com/articles/43525/disaster-recovery-and-backup-best-practices-in-a-t.html
MICROBENCHMARKING APACHE STORM 1.0 PERFORMANCE
http://hortonworks.com/blog/microbenchmarking-storm-1-0-performance/
TOP 5 ARTICLES ON HADOOP
http://hortonworks.com/blog/top-5-articles-hadoop/
TOP ARTICLES AND QUESTIONS FROM HCC LAST WEEK
http://hortonworks.com/blog/top-articles-questions-hcc-last-week/
HIVE LLAP TECHNICAL PREVIEW ENABLES SUB-SECOND SQL ON HADOOP AND MORE
http://hortonworks.com/blog/llap-enables-sub-second-sql-hadoop/
APACHE METRON TECH PREVIEW 2 AVAILABLE NOW!
http://hortonworks.com/blog/apache-metron-technical-preview-2/
LATEST INNOVATION WITHIN HORTONWORKS DATA PLATFORM (HDP) 2.5 UNVEILED
http://hortonworks.com/blog/latest-innovation-within-hortonworks-data-platform-hdp-2-5-unveiled/
SPARK-ON-HBASE: DATAFRAME BASED HBASE CONNECTOR
http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/
UNDER-THE-HOOD WITH AMBARI METRICS AND GRAFANA
http://hortonworks.com/blog/hood-ambari-metrics-grafana/
A BRIEF HISTORY OF APACHE STORM
http://hortonworks.com/blog/brief-history-apache-storm/
HORTONWORKS HDP AND SAS EVENT STREAM PROCESSING TOGETHER, USING YARN
https://hortonworks.com/blog/hortonworks-hdp-and-sas-event-stream-processing-together-using-yarn/
APACHE SPARK & APACHE ZEPPELIN: WHAT’S COMING IN HDP 2.4.2
http://hortonworks.com/blog/apache-spark-apache-zeppelin-whats-coming-in-hdp-2-4-2/
ANNOUNCING CLOUDBREAK 1.2
http://hortonworks.com/blog/announcing-cloudbreak-1-2/
ANNOUNCING APACHE STORM 1.0.0
http://hortonworks.com/blog/announcing-apache-storm-1-0-0/
THE NEXT GENERATION OF HADOOP-BASED SECURITY & DATA GOVERNANCE
http://hortonworks.com/blog/the-next-generation-of-hadoop-based-security-data-governance/
ADVANCED METRICS VISUALIZATION DASHBOARDING WITH APACHE AMBARI
http://hortonworks.com/blog/advanced-metrics-visualization-dashboarding-apache-ambari/
STREAMLINING APACHE HADOOP OPERATIONS
http://hortonworks.com/blog/streamlining-apache-hadoop-operations/
THE NEXT MARKET LEADERS WILL POWER THEIR BUSINESSES FROM IOAT DATA SOURCES
http://hortonworks.com/blog/next-market-leaders-will-power-businesses-ioat-data-sources/

Databricks

Preview of Apache Spark 2.0 now on Databricks Community Edition
https://databricks.com/blog/2016/05/11/spark-2-0-technical-preview-easier-faster-and-smarter.html
Spark Trending in the Stack Overflow Survey
https://databricks.com/blog/2016/03/22/spark-trending-in-the-stack-overflow-survey.html
http://stackoverflow.com/research/developer-survey-2016
Continuous Integration and Delivery of Spark Applications at Metacog
https://databricks.com/blog/2016/04/06/continuous-integration-and-delivery-of-spark-applications-at-metacog.html

MapR

IoT Spotlight: Sensor to Dashboard – Real-Time Stream Processing for Oil and Gas
https://www.mapr.com/blog/iot-spotlight-sensor-dashboard-real-time-stream-processing-oil-and-gas
Using MapR, Mesos, Marathon, Docker, and Apache Spark to Deploy and Run Your First Jobs and Containers
https://www.mapr.com/blog/using-mapr-mesos-marathon-docker-and-apache-spark-deploy-and-run-your-first-jobs-and-containers
Apache Apex on MapR Converged Platform
https://www.mapr.com/blog/apache-apex-mapr-converged-platform
Monitoring a MapR Cluster with Elasticsearch + Kibana
https://www.mapr.com/blog/monitoring-mapr-cluster-elasticsearch-kibana
Real Time Credit Card Fraud Detection with Apache Spark and Event Streaming
https://www.mapr.com/blog/real-time-credit-card-fraud-detection-apache-spark-and-event-streaming
Fast, Scalable, Streaming Applications with the Kafka API (MapR Streams), Spark Streaming, and the HBase API (MapR-DB)
https://www.mapr.com/blog/fast-scalable-streaming-applications-kafka-api-mapr-streams-spark-streaming-and-hbase-api-mapr

[115]简报:大数据Hadoop动态 - 2016Q1

Cloudera

Hortonworks

Top Ten Blogs from 2015
http://hortonworks.com/blog/top-ten-blogs-from-2015/
Best practices in HDFS authorization with Apache Ranger
http://hortonworks.com/blog/best-practices-in-hdfs-authorization-with-apache-ranger/
Community Choice Winner Blog: Advanced Execution Visualization of Spark jobs
http://hortonworks.com/blog/community-choice-winner-blog-advanced-execution-visualization-spark-jobs/
Delivering an Advanced Analytics Platform for Media Management with Arkena
http://hortonworks.com/blog/delivering-an-advanced-analytics-platform-for-media-management-with-arkena/
Top 3 articles for every Hadoop Developer
http://hortonworks.com/blog/it-has-a-been-a-busy-week-on-hcc-here-is-the-hot-content-for-this-week-based-on-community-activity-and-votes/
Ambari Metrics - Switching from Embedded to Distributed Mode Fails
https://community.hortonworks.com/questions/17421/ambari-metrics-switching-from-embedded-to-distribu.html
Top 3 articles for every Hadoop Developer
http://hortonworks.com/blog/top-3-articles-that-ever-hadoop-developer-should-read-week-072016/
Windowing and State checkpointing in Apache Storm
http://hortonworks.com/blog/storm-support-windowing-state/
https://community.hortonworks.com/articles/14171/windowing-and-state-checkpointing-in-apache-storm.html
Hadoop All Grown Up
http://hortonworks.com/blog/hadoop-all-grown-up/
Top 3 articles for every Hadoop Developer 2
http://hortonworks.com/blog/top-3-articles-every-hadoop-developer-2/
Hive on Tez Performance Tuning – Determining Reducer Counts
https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.html
What is the Hortonworks recommendation on Swap usage?
https://community.hortonworks.com/questions/22548/what-is-the-hortonworks-recommendation-on-swap-usa.html
Top 3 articles for every Hadoop Developer
http://hortonworks.com/blog/top-3-articles-every-hadoop-developer/
Install Apache Hawq on HDP 2.3.4
https://community.hortonworks.com/articles/20420/install-apache-hawq-on-hdp-234.html
Top 3 articles for every Hadoop Developer
http://hortonworks.com/blog/enter-title-here/
How to remove risk disks from Hadoop cluster ?
https://community.hortonworks.com/questions/18593/how-to-remove-risk-disks-from-hadoop-cluster.html
https://community.hortonworks.com/articles/3131/replacing-disk-on-datanode-hosts.html
Capacity scheduler queue mapping while doAs disabled
https://community.hortonworks.com/questions/18639/capacity-scheduler-queue-mapping-while-doas-disabl.html
Hortonworks DataFlow 1.2 Released
http://hortonworks.com/blog/hortonworks-dataflow-1-2-released/
Top 3 articles for every Hadoop Developer
http://hortonworks.com/blog/top-3-articles-every-hadoop-developer-3/
Quickly enable SSL encryption for Hadoop components in HDP Sandbox
https://community.hortonworks.com/articles/22756/quickly-enable-ssl-encryption-for-hadoop-component.html
Hortonworks HDP and SAS Event Stream Processing together, using YARN
http://hortonworks.com/blog/hortonworks-hdp-and-sas-event-stream-processing-together-using-yarn/

MapR

Apache Spark as a Distributed SQL Engine
https://www.mapr.com/blog/apache-spark-distributed-sql-engine
Apache Flink GA - Planning for the Future
https://www.mapr.com/blog/apache-flink-ga-planning-future
How to Log in Apache Spark
https://www.mapr.com/blog/how-log-apache-spark
Streaming in the Extreme
https://www.mapr.com/blog/streaming-extreme
Secondary Indexing for MapR-DB using Elasticsearch
https://www.mapr.com/blog/secondary-indexing-mapr-db-using-elasticsearch
Top 10 Most Popular MapR Blog Posts of 2015
https://www.mapr.com/blog/top-10-most-popular-mapr-blog-posts-2015
Architecture Matters for Production Success
https://www.mapr.com/why-hadoop/why-mapr/architecture-matters
Spark Data Source API: Extending Our Spark SQL Query Engine
https://www.mapr.com/blog/spark-data-source-api-extending-our-spark-sql-query-engine
What Will You Do in 2016? Apache Spark, Kafka, Drill and More
https://www.mapr.com/blog/what-will-you-do-2016-apache-spark-kafka-drill-and-more
A Brief Overview of Performance Enhancements in Apache Drill 1.4
https://www.mapr.com/blog/brief-overview-performance-enhancements-apache-drill-14

Databricks

Announcing Spark 1.6
https://databricks.com/blog/2016/01/04/announcing-spark-1-6.html
Introducing Spark Datasets
https://databricks.com/blog/2016/01/04/introducing-spark-datasets.html
Spark 2015 Year In Review
https://databricks.com/blog/2016/01/05/spark-2015-year-in-review.html
Deep Learning with Spark and TensorFlow
https://databricks.com/blog/2016/01/25/deep-learning-with-spark-and-tensorflow.html
Faster Stateful Stream Processing in Spark Streaming
https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-spark-streaming.html
An Illustrated Guide to Advertising Analytics
https://databricks.com/blog/2016/02/02/an-illustrated-guide-to-advertising-analytics.html
Introducing Databricks Community Edition: Apache Spark for All
https://databricks.com/blog/2016/02/17/introducing-databricks-community-edition-apache-spark-for-all.html
Introducing Databricks Dashboards
https://databricks.com/blog/2016/02/17/introducing-databricks-dashboards.html
Introducing GraphFrames
https://databricks.com/blog/2016/03/03/introducing-graphframes.html
On-Time Flight Performance with GraphFrames for Apache Spark
https://databricks.com/blog/2016/03/16/on-time-flight-performance-with-spark-graphframes.html
Announcing New Databricks APIs for Faster Production Spark Application Deployment
https://databricks.com/blog/2016/03/30/announcing-new-databricks-apis-for-faster-production-spark-application-deployment.html
Introducing our new eBook: Apache Spark Analytics Made Simple
https://databricks.com/blog/2016/03/31/introducing-our-new-ebook-apache-spark-analytics-made-simple.html

参考