惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Attack and Defense Labs
Attack and Defense Labs
T
Threatpost
C
Cybersecurity and Infrastructure Security Agency CISA
H
Hackread – Cybersecurity News, Data Breaches, AI and More
I
Intezer
C
Cyber Attacks, Cyber Crime and Cyber Security
The Register - Security
The Register - Security
量子位
Security Latest
Security Latest
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
大猫的无限游戏
大猫的无限游戏
小众软件
小众软件
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
C
CXSECURITY Database RSS Feed - CXSecurity.com
MyScale Blog
MyScale Blog
J
Java Code Geeks
Apple Machine Learning Research
Apple Machine Learning Research
Google DeepMind News
Google DeepMind News
WordPress大学
WordPress大学
Spread Privacy
Spread Privacy
Jina AI
Jina AI
博客园 - 【当耐特】
P
Palo Alto Networks Blog
Last Week in AI
Last Week in AI
SecWiki News
SecWiki News
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
G
GRAHAM CLULEY
宝玉的分享
宝玉的分享
Hacker News - Newest:
Hacker News - Newest: "LLM"
T
The Blog of Author Tim Ferriss
V
Vulnerabilities – Threatpost
有赞技术团队
有赞技术团队
T
Tor Project blog
H
Hacker News: Front Page
A
Arctic Wolf
NISL@THU
NISL@THU
A
About on SuperTechFans
云风的 BLOG
云风的 BLOG
Engineering at Meta
Engineering at Meta
V
V2EX
N
News and Events Feed by Topic
Webroot Blog
Webroot Blog
Know Your Adversary
Know Your Adversary
P
Privacy International News Feed
I
InfoQ
D
Docker
L
LINUX DO - 最新话题
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
U
Unit 42

Spark

求助广大网友 - V2EX 真的深入了解开源项目是动手实现--《Spark Core 精简版》 - V2EX spark 做内容推荐,希望大佬给一些思路上的指导 - V2EX 有没有不错的 SparkStreaming+Kafka 的开源项目可以用来入门和进阶? - V2EX Spark 解析复杂 xml,数据如何映射到多表中 - V2EX spark 大数据离线分析 爬虫存到 csv 有的列是长度不固定的 list 请问应该怎么存到 hive?直接存 list 吗?该怎么分析呢? - V2EX PayPal 招 资深大数据工程师 啦 - 技术栈: Spark, Scala, Java , Python 等 - V2EX 关于 Spark Task 的疑问 - V2EX 有没有在滴滴或者其他网约车公司的同学,请教一个数据量的问题 - V2EX spark 作业求助,剔除空值大于三的行 - V2EX spark 有用 kotlin 写代码的吗? - V2EX 现在写 spark 程序,都是用 scala 吗 - V2EX spark 核心构件之 Dependency 宽窄依赖 - V2EX spark 内存管理的实现 spark 源码研究 - V2EX spark straming。submit Python 脚本报错。 - V2EX CPython, PyPy 和 Scala 在 Spark 平台上的性能对比 - V2EX Spark/Scala 的细节讨论:在 map task 里的 map 会得到如何的处理? - V2EX SPARK 文档查询好费劲 - V2EX Apache Spark 之间的共享项目配置文件问题 疑问:spark对于迭代运算场景很有优势,那对于迭代不严重的计算场景呢? - V2EX First Steps with Spark – Screencast #1 - V2EX
求助几个 Spark 问题 - V2EX
hitzhaowenqiang · 2022-06-29 · via Spark

Q1: Someone handed you this dataset (~1GB), and you discovered that it’s over 1,000 tiny files. var df = spark.read.format("orc").load(clean_tracker_cstt_path) ○ Using Spark, please show how you can improve storage efficiency, and explain why this is important. ○ After improving storage efficiency, please explain impact on loading and using dataset in Spark.

Q2: Given the schema below, use Spark 2.x Dataframe API to give count of events per day for the last 7 days.

root |-- action_id: integer (nullable = true) |-- receive_time: timestamp (nullable = true) |-- uuid: string (nullable = true)

Q3: You have calculated the Daily Event Count above using Spark API. Now please find the Min, Max, Mean, and Standard Deviation of Daily Count by using Scala. Only built-in Scala functions may be used. Please format the answer with 2 decimal places, e.g. “The Average Daily Count from Last 7 Days is x.xx”.